Back-end scraper for a price comparison website

IN PROGRESS
Bids
8
Avg Bid (USD)
$331
Project Budget (USD)
$250 - $750

Project Description:
1) I have already developed a desktop based application using Scrapy/ Python that is hard coded to crawl to three separate sites (using three "spiders") that can pull out product details such as Product ID, Title, Price, Vendor and Stock Position. At present, these details are used to generate .sql files that need to be uploaded to the web server to update the Product Table in the database.

2) The current requirement is to develop a Server version of the scraper. The expected features are as under:-

a) The Products Table in the server database to be automatically populated by the scraper. The required fields are Product ID, Title, Price, Vendor, Stock Position, Payment Options, Delivery Time
b) Easy extensibility (with some python coding) to add more sites in future.
c) To meet the above, the scraper to be implemented as two modules. The "Scraper Module" and the "Parameter Module".
d) The "Scraper Module" would do the actual scraping of multiple sites (based on parameters read from the Parameters Module), and also automatically populate the Products Table in the database server. For sites with content rendered in JavaScript, Scrapy to be used with Selenium for effective scraping.
e) The "Parameters Module" would include a Form through which scrape parameters such as the primary URL, scraping rules for each field to be scraped, format of data to be extracted, and whether to use simple crawl (for sites without JavaScript) or complex crawl (for sites with content rendered in JavaScript). These parameters would be stored in a table, and accessed by the "Scraper Module" at run time.
f) The scraped URLs (referred by the primary URL) to be saved in a Database Table with "processed flag", so that these can be skipped if scraping needs to be resumed after interruption.
g) Primary URLs also to be saved with the date of last successful scraping, to enable scheduling of periodic repeat scrapings.
h) While executing scraping, only those fields that have changed since last scrape are to be extracted and the original table entry for the product to be "updated", as required. In case of new products, the details to be "inserted" as a new row in the Products Table.
i) Scrapy to be used with Selenium for effective scraping of sites with heavy JavaScript content.
j) Performance must be adequate to enable scraping of the sites in order to generate the Products database

Expected Skills: Web Scraping, Scrapy, Selenium, Python, Data Mining, Javascript, MySQL
Budget: USD 200 to USD 300

Skills required:
Data Mining, MySQL, Python, Web Scraping
Hire anandsnair
Project posted by:
anandsnair India
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 250
in 5 days
$ 300
in 14 days
$ 250
in 5 days
Hire zhoumo
$ 250
in 5 days
$ 600
in 7 days
$ 500
in 2 days
$ 250
in 5 days
Hire nitinrajpal
$ 250
in 10 days