Create Multi-Threaded Distributed Web Crawler on AWS

Closed

Description

This is much, much simpler than a typical 'web crawler'. It needs to be run as cheaply as possible (preferably on AWS).

The software has 2 simple functions:

1. URLS: Grab a webpage (with a multi-threaded approach), these are simply pulled from the db along with the extraction class to use.

2. EXTRACTION CLASSES: Classes with ability to easily extract data from HTML, following a given pattern and insert into db. (with a multi-threaded approach)

You should follow this Perl approach and make sure your solution will garner similar, if not better results.

[url removed, login to view]

(Further reading: [url removed, login to view] )

For an experienced programer I expect this to take no longer than a day as instructions are laid out above, therefore budget is very low, bid accordingly.

Skills: Amazon Web Services, C Programming, C++ Programming, Perl, Software Architecture

See more: multi threaded web crawler, www programming org com, web-crawler, multi threaded programming, html programer, how to create webpage, how to create a webpage, how to create a blog, how to be programer, how to be a programer, c programming web crawler, multi-threaded, how to create a web crawler, create blog webpage, web programer, threaded, ddi, create a multi, crawl a we, web data crawler, perl extract data webpage, aws solution, perl webpage extract, create pattern, aws webpage

Project ID: #4381334

4 freelancers are bidding on average $179 for this job

mccheung

Hello, I'm 3 years perl gramer. and good at on data scrap job. Thanks

$176 USD in 4 days
(1 Review)
2.5
d0tnet12

consider it done . !!! check pm.

$180 USD in 6 days
(2 Reviews)
2.4
pvdenis76

could you explain few Qs: 1. is pages already downloaded and saved in db ? 2. what you mean "given pattern" is it regexp ? ...forked processes not a prob, prob to understand from where take info and how to process More

$160 USD in 3 days
(0 Reviews)
0.0
shahroz91

consider it done.

$200 USD in 5 days
(0 Reviews)
0.0