Spider/crawl/scrape 100 million URLs

  • Status Closed
  • Budget $500 - $2900 USD
  • Total Bids 56

Project Description

I have a list of 100 million URLs.

I need them to be spidered/crawled (the website limits the rate at which the URLs can be crawled and also blocks IP addresses that hit it multiple times, so multiple IPs should be used) and scrape about 5 variables of data from each URL (the structure of the information on all the URLs will be the same).

You should then deliver the scraped information in a CSV file or MySQL database.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online