Closed

HTML Parsing / Data Mining

This project received 6 bids from talented freelancers with an average bid price of $163 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
$30 - $250 USD
Total Bids
6
Project Description

I'm looking for someone to write me some c# code to extract contact information from websites - Company Name, Address, phone number, and "about" details for the company.

This will be integrated into an existing application, and I am looking for a self-contained class to handle it, along with a demonstration test application. The class will need to accept all properties required to configure it (not rely on a config file) - the demo app, of course, will have one.

The class should accept a domain name (e.g. [url removed, login to view]), and perform following steps:

1) Fetch the homepage and parse all links on it.
2) Figure out which link is the "About" and "Contact Us" pages.
3) Retrieve both pages.

We then want to extract the following information:

a) Company Name, Address, State, Suburb, Postcode, Phone, Fax
b) The human-readable text of "About Us" without all of the menus text in both text and HTML version.
c) The human-readable contact us text - without all menus texts in both text and HTML version.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online