Need a script
- Status Completed
- Budget $30 - $60 USD
- Total Bids 5
I would like a website scraping script that will do the following (all my scripts right now are run on UNIX - not Windows, so no EXEs please):
1) Scrape a site and download various fields from the webpages.
2) This script needs to run as a CRON job so that only updates are downloaded on a regular basis.
3) I need these scripts to be delivered to me fully tested.
4) These scripts need to gather data, write them into CSV files.
5) There should be very good debug/error information.
6) The script should accept a list of proxies and a delay time. The proxy list should be rotated so that the same proxy is not used for an entire scraping session.
7) The script should be run from command line with a category (e.g 'Baby Store' or 'Grocery')
8) Send out email on error
6) Download ONLY updates and not the entire site again. In other words, if there is an error and I rerun the script - it shouldn't proceed downloading the entire site again. Also, if I need to update the contents by rerunning the script, it should not download the entire site again.
9) Should save images, csv separately, rename image files with a predefined variable prepended.. e.g. original image file is called a_b_c.jpg.. save it as Variable_a_b_c.jpg.
Please apply only if you have experience with all of the above.
Thanks for reading.Get free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online