We are looking for a developer with experience in parsing data from 3rd party websites.
We have already developed a library and framework for our parsing routines. There is an existing library (10,000+ lines) of code create for parsing a number web sources. We are looking for a developer to expand this library and build parsing scripts for additional sources.
Each source requires a separate set of unique routines.
The creation of a routine set for one source should take 50-100 hours.
All source parsing routines share a common library. The library should be sufficient for the development requirements. However, adding/modifying this library may be necessary.
The parsing involves pulling information from HTML, cleaning it, and populating a database.
It involves linking related data.
It involves spider-like crawling and indexing of information.
Everything is based on PHP, MySQL (mostly InnoDB) and bash scripting.
We will provide documentation for the library as well as access to the source code for all the other parsing routines to allow you to learn from existing code.
The parsing of each source is a unique challenge and requires a high degree of creativity. Creativity is very important to us, as we are looking for someone who is able to come up with unique and innovative solutions on his own.
If we are satisfied with your work and abilities after working on the first source we provide you, there is an abundance of additional work possible.