Web Scraping To DB With Python or Perl

  • Status Closed
  • Budget N/A
  • Total Bids 25

Project Description

This is a web scraping project that can be done in either python or perl.

I am developing a social shopping comparison site, and have authorisation from the

sites we are scraping to crawl and extract data daily.

Scraping needs to be setup for 10 ecommerce sites, for example Firebox .com . About 10,000

urls in total as each site has about 1000 products.

The data that needs to be extracted includes:

Prod Page Url

Product Name

Product Price

Stock status

Image url1

This data needs to be stored in a database that will be hosted on Amazon AWS. The image also needs to be

downloaded and renamed and stored on our local server, and the local image path also added to the DB.

There can be a very basic scraping UI that we can set the interval for new scrapes.

It should also be easy to setup new sites to scrape, but we can also pay you to add these if needed.

In case needed you should be familiar with routing the scraper through Tor.

After urls are scraped, additional social media data needs to be added to each url in the db, for example for Facebook,

with this JSON API:

Facebook*: [url removed, login to view]

In addition to designing this scrape program, I will require ongoing support and maintenance of the code, and have

many such similar projects in mind that I would like to work on with the successful applicant.

I want to work with someone who is responsive via Skype and can get back to me very quickly regarding issues.

PLEASE NOTE: Please apply stating your experience in web scraping, and also your advice on whether to use

python or perl, and also your experience of Tor.

If you cannot be available to discuss issues on skype text chat, please do not bid.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online