Cancelled

Create customized parsing routines for gathering information

We are looking for a developer with experience in parsing data from 3rd party websites.

We have already developed a library and framework for our parsing routines. There is an existing library (10,000+ lines) of code create for parsing a number web sources. We are looking for a developer to expand this library and build parsing scripts for additional sources.

Each source requires a separate set of unique routines.

The creation of a routine set for one source should take 50-100 hours.

All source parsing routines share a common library. The library should be sufficient for the development requirements. However, adding/modifying this library may be necessary.

The parsing involves pulling information from HTML, cleaning it, and populating a database.

It involves linking related data.

It involves spider-like crawling and indexing of information.

Everything is based on PHP, MySQL (mostly InnoDB) and bash scripting.

We will provide documentation for the library as well as access to the source code for all the other parsing routines to allow you to learn from existing code.

The parsing of each source is a unique challenge and requires a high degree of creativity. Creativity is very important to us, as we are looking for someone who is able to come up with unique and innovative solutions on his own.

If we are satisfied with your work and abilities after working on the first source we provide you, there is an abundance of additional work possible.

Skills: PHP

See more: working first data, web developer degree, someone looking web developer, requirements web developer, php developer information, looking database developer, looking access developer, learn websites, learn web code, learn php development, html scripting access, development challenge, developer challenge, data challenge, create websites html, code challenge, challenge websites, challenge web, challenge development, php parsing routines, adding hours, source code html, learn create websites, learn php web development, library information

About the Employer:
( 0 reviews ) South Orange, United States

Project ID: #215798