Web Scraping To DB With Python or Perl
- Status Closed
- Budget N/A
- Total Bids 25
This is a web scraping project that can be done in either python or perl.
I am developing a social shopping comparison site, and have authorisation from the
sites we are scraping to crawl and extract data daily.
Scraping needs to be setup for 10 ecommerce sites, for example Firebox .com . About 10,000
urls in total as each site has about 1000 products.
The data that needs to be extracted includes:
Prod Page Url
This data needs to be stored in a database that will be hosted on Amazon AWS. The image also needs to be
downloaded and renamed and stored on our local server, and the local image path also added to the DB.
There can be a very basic scraping UI that we can set the interval for new scrapes.
It should also be easy to setup new sites to scrape, but we can also pay you to add these if needed.
In case needed you should be familiar with routing the scraper through Tor.
After urls are scraped, additional social media data needs to be added to each url in the db, for example for Facebook,
with this JSON API:
Facebook*: [url removed, login to view]
In addition to designing this scrape program, I will require ongoing support and maintenance of the code, and have
many such similar projects in mind that I would like to work on with the successful applicant.
I want to work with someone who is responsive via Skype and can get back to me very quickly regarding issues.
PLEASE NOTE: Please apply stating your experience in web scraping, and also your advice on whether to use
python or perl, and also your experience of Tor.
If you cannot be available to discuss issues on skype text chat, please do not bid.Get free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online