Web Scraper using python, scrapy, MySQL & JSON

This project received 24 bids from talented freelancers with an average bid price of $167 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
Total Bids
Project Description

I would like a web scraper that:
1. Retrieves a seed list of uri's from a MySQL database
2. Using multiple threads (twisted framework) and scrapy - scrapes all page for links (1 level deep only)
3. Validates the link to ensure it is a full url
4. Get the response from the scraped url (i.e. redirect, OK, not found)
4a. If no response try a DNS lookup
5. Saves the root address and response results, then import them into a MySQL table (this can be batched through a JSON file if required)

As this is being created as a proof of concept it doesn't need to be created using django unless this does not effect the price. It can be launched from a linux console.

The most important part of this project is that the scraping is made efficient by using multiple threads and by eliminating duplicate url's in step 4 to ensure the links aren't being sent requests multiple times.

This project has the potential for additional development if the right developer is found.

Note: Well commented code is expected.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online