Scrape Data from Websites

  • Status Closed
  • Budget N/A
  • Total Bids 10

Project Description

I need Xpath selection templates/paths or C# or VB code to use in a website scraping software that I use.

Here are a few things I need to do:

1:

Find the link to the contact us page. This page is usually called...

-> Contact Us

-> Contact

-> Get In Touch

So it is almost always some variation of the words above.

This can easily be done through Xpath. The two elements below I am having a bit of trouble with.

2:

Find a phone number on the contact us page. The phone numbers posted typically follow this format...

(XXX) XXX-XXXX

XXX XXX-XXXX

XXX-XXX-XXXX

[url removed, login to view]

And sometimes are prreceeded by a 1 like the examples below...

1 (XXX) XXX-XXXX

1 XXX XXX-XXXX

1 XXX-XXX-XXXX

1 [url removed, login to view]

Words like Tel, Telephone, Phone, Contact Number, Local, Toll Free etc maybe right before the phone number.

3:

And finally extract an e-mail address. The e-mail address usually has these words in it...

@, {at}, [at], (at), at, dot, ., com, .com (of course in addition to the actual e-mail address).

Words like e-mail, Email, E-mail, email, Email Address, Toll Free etc maybe used right before the email address as well.

The program I use visual web ripper has support for [url removed, login to view], regex and c# if you can find a better way of getting the required info.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online