This is much, much simpler than a typical 'web crawler'. It needs to be run as cheaply as possible (preferably on AWS).
The software has 2 simple functions:
1. URLS: Grab a webpage (with a multi-threaded approach), these are simply pulled from the db along with the extraction class to use.
2. EXTRACTION CLASSES: Classes with ability to easily extract data from HTML, following a given pattern and insert into db. (with a multi-threaded approach)
You should follow this Perl approach and make sure your solution will garner similar, if not better results.
[url removed, login to view]
(Further reading: [url removed, login to view] )
For an experienced programer I expect this to take no longer than a day as instructions are laid out above, therefore budget is very low, bid accordingly.
Looking to make some money?
- Set your budget and the time frame
- Outline your proposal
- Get paid for your work
Bids on this Project
And God said, Let there be light: and there was light!! 5+ years perl programer, Web develop and Server side tools develop.
We are software development team and we can provide all software and solutions according to your requirements.