Do you have experience parsing HTML, using a headless browser to navigate website and turn HTML content and HTML tables into structured data? Then you are needed. Ideally I'd like this written in C# but I'll take Perl or Python or Java if the HTML parsing experience is there.
We need a talented application and database developer to create a program that recursively goes through given web pages, parses them and turns the content into structured data.
- parse HTML and add text and HTML to a RDBMS (preferably SQL server but I'm open to others)
- identify tables in HTML, parse content and add to relational tables
- clean data and consolidate over different time periods
A shell program has already been written and I will personally oversee the development very closely. There are just a lot of details and scenarios to get data cleanly from the materials we are looking to scrape.
- A work style that is extremely detail oriented
- Strong communication skills
- A complete Elance profile
- References or an established reputation
Database Programming, HTML parsing, headless browser, regular expressions