Closed

Web Scraping into Database

This project received 24 bids from talented freelancers with an average bid price of $253 USD.

Get free quotes for a project like this
Employer working
Project Budget
$50 - $300 USD
Total Bids
24
Project Description

We want to build a tool that will scrape several websites on a regular basis. Initially, we want to do this to collect all of the information we can. Subsequent scrapes will be to look for changes/updates (i.e. new images, new prices, new products added, products deleted). Some websites will be suppliers, others will be competitors. The information collected needs to be stored in a database.
We already had some work done to scrape one supplier's site. I can give you access to the scraping code and the database.

Once the data is collected, we need to come up with a way to easily use it. The first major challenge will be aligning the products from the different websites so that we can accurately compare information.
If would be fantastic if much of this "matching" was done programmatically. Once we are certain the product information collected from each of the websites "matches," we can compare the products. We can then determine what our competition is selling the item for, and compare it to our price.

Another example is product description and images. With 7000 products, it is very time consuming to enter a meaningful product description and up-to-date image for every product. I'd like the ability to use the information we scrape to populate our store's database. For example, use this description for this product.
Once the data collection is done, the manipulation/use of the data will evolve. This may be best done on an hourly basis. If possible, I'd like to know the cost to set up each scrape on our server and get the information into a database.

NOTE: If it is not clear, our end goal is to data mine. We want to know what our competition is doing. If you have experience manipulating and using the collected information, please let us know.

The websites are
[url removed, login to view] (supplier/competitor)
[url removed, login to view] (supplier/competitor)
[url removed, login to view] (competitor)
[url removed, login to view] (competitor)
[url removed, login to view] (competitor)
[url removed, login to view] (competitor)
[url removed, login to view] (competitor)

Please look at the attached excel file for the list of fields to be scraped

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online