Using python and the scrapy framework adapt our existing or create a new scraper which will do the following:
Crawl amzon.com and scrape information about the reviewers
Invoking the program should look like:
$ scrapy crawl amazon_reviewers --set FEED_URI=reviewers.csv --set FEED_FORMAT=csv
The fields we want to see are:
website - URL of the user's website (if any)
name - The user's name on amazon
pic_url - Url to their profile pic (if any)
userid - Their amazon profile id
email - E-mail address
Rank - Reviewer rank on amazon
Location - Geographic location if listed
num_reviews - Number of total reviews
The output should look like this in csv form:
http://...url...,K. Harris,http://ecx.images-amazon.com/images/I/51X5GLNvY1L._SL150_.jpg,,,10,, 5000
Delivery should include:
- Full source code
- Scrape of reviewers from the top level categories: Electronics & Computers, Toys Kids & Baby, Sports & Outdoors, Home Garden & Tools, Automotive & Industrial
Major challenge is finding these reviewers. Only 10,000 are exposed easily, the rest must be found by crawling through product pages.
The other major challenge will be actually doing the crawling. I have access to significant EC2 resources so I can help you once the code is ready, but I'd prefer it if you were able to crawl yourself and then just provide a data sample to verify.
This MUST be done using python and the scrapy framework as we will need it to work with the rest of our tech stack.
Will likely expand scope upon timely delivery of first project. Please include a link to your prior work, preferably a public github with python projects visible.
Price is negotiable.