HTML/web page parser

  • Status Closed
  • Budget $250 - $750 USD
  • Total Bids 14

Project Description

Do you have experience parsing HTML, using a headless browser to navigate website and turn HTML content and HTML tables into structured data? Then you are needed. Ideally I'd like this written in C# but I'll take Perl or Python or Java if the HTML parsing experience is there.

Job Description:

We need a talented application and database developer to create a program that recursively goes through given web pages, parses them and turns the content into structured data.

- parse HTML and add text and HTML to a RDBMS (preferably SQL server but I'm open to others)

- identify tables in HTML, parse content and add to relational tables

- clean data and consolidate over different time periods

A shell program has already been written and I will personally oversee the development very closely. There are just a lot of details and scenarios to get data cleanly from the materials we are looking to scrape.

Your qualifications:

- A work style that is extremely detail oriented

- Strong communication skills

- A complete Elance profile

- References or an established reputation

Desired Skills

Database Programming, HTML parsing, headless browser, regular expressions

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online