Web Crawl from Internet Archive
This project was successfully completed by debaphp for $155 USD in 3 days.Get free quotes for a project like this
Project Budget$30 - $250 USD
Completed In3 days
I'd like to gather some data for an academic project to study the electronic book market.
The Internet Archive (Wayback Machine) had crawled websites that are of interest to me in the relevant period, and I'd like your help to
(1) Crawl Internet Archive to save html pages of interest
(2) Extract relevant fields in the html to form a comma separated file ready for data analysis packages.
The webpage of interest are product page of books or e-book reader devices in the following period, venue, and category:
2010.1 - 2010.5 (one capture a day if available)
Amazon, Barns & Noble
Physical Book, Kindle/Nook book. (not textbook, newspaper, etc. )
Device itself: Kindle and Nook.
Books listed as bestseller, award winner, editor's picks, best books, book club, etc.
We can discuss whether it's easier to get all books or just the popular books.
Fields of interest: Title, author, publisher, # reviews, ratings, list price, discount price, price of other formats, whether listed as bestseller, sales rank, ISBN, category.
(1) Small sample - prefer to have a small sample by May 14th.
Amazon only, one day in mid March, one day in mid April, one day in mid May in 2010.
(2) Negotiable, but preferably completed before June 5th.
(3) Possible future projects to extract 2005-2013 if initial run goes well.
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online