I have a list of 100 million URLs.
I need them to be spidered/crawled (the website limits the rate at which the URLs can be crawled and also blocks IP addresses that hit it multiple times, so multiple IPs should be used) and scrape about 5 variables of data from each URL (the structure of the information on all the URLs will be the same).
You should then deliver the scraped information in a CSV file or MySQL database.
Additional Project Description:
07/30/2013 at 10:43 EEST
the urls are all from one domain