Closed

Create Multi-Threaded Distributed Web Crawler on AWS

This project received 4 bids from talented freelancers with an average bid price of $179 USD.

Get free quotes for a project like this
Employer working
Project Budget
N/A
Total Bids
4
Project Description

This is much, much simpler than a typical 'web crawler'. It needs to be run as cheaply as possible (preferably on AWS).

The software has 2 simple functions:
1. URLS: Grab a webpage (with a multi-threaded approach), these are simply pulled from the db along with the extraction class to use.
2. EXTRACTION CLASSES: Classes with ability to easily extract data from HTML, following a given pattern and insert into db. (with a multi-threaded approach)


You should follow this Perl approach and make sure your solution will garner similar, if not better results.
[url removed, login to view]


(Further reading: [url removed, login to view] )




For an experienced programer I expect this to take no longer than a day as instructions are laid out above, therefore budget is very low, bid accordingly.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online