Create Multi-Threaded Distributed Web Crawler on AWS

Budget N/A
Bids 4
Average Bid $179

This is much, much simpler than a typical 'web crawler'. It needs to be run as cheaply as possible (preferably on AWS).

The software has 2 simple functions:
1. URLS: Grab a webpage (with a multi-threaded approach), these are simply pulled from the db along with the extraction class to use.
2. EXTRACTION CLASSES: Classes with ability to easily extract data from HTML, following a given pattern and insert into db. (with a multi-threaded approach)


You should follow this Perl approach and make sure your solution will garner similar, if not better results.
[url removed, login to view]


(Further reading: [url removed, login to view] )




For an experienced programer I expect this to take no longer than a day as instructions are laid out above, therefore budget is very low, bid accordingly.

Post a Project Like This

Looking to make some money?

  • Set your budget and the time frame
  • Outline your proposal
  • Get paid for your work

Bids on this Project