I would like a website scraping script that will do the following (all my scripts right now are run on UNIX - not Windows, so no EXEs please):
1) Scrape a site and download various fields from the webpages.
2) This script needs to run as a CRON job so that only updates are downloaded on a regular basis.
3) I need these scripts to be delivered to me fully tested.
4) These scripts need to gather data, write them into CSV files.
5) There should be very good debug/error information.
6) The script should accept a list of proxies and a delay time. The proxy list should be rotated so that the same proxy is not used for an entire scraping session.
7) The script should be run from command line with a category (e.g 'Baby Store' or 'Grocery')
8) Send out email on error
6) Download ONLY updates and not the entire site again. In other words, if there is an error and I rerun the script - it shouldn't proceed downloading the entire site again. Also, if I need to update the contents by rerunning the script, it should not download the entire site again.
9) Should save images, csv separately, rename image files with a predefined variable prepended.. e.g. original image file is called a_b_c.jpg.. save it as Variable_a_b_c.jpg.
Please apply only if you have experience with all of the above.
Thanks for reading.
4 freelancers are bidding on average $56 for this job
HI, READY TO START WORK. LETS START AND FINISH IT ASAP. I HAVE 8 YEARS OF EXPERIENCE IN WEB DEVELOPMENT FIELD ( PHP, MYSQL, AJAX, JQUERY, HTML, CSS... ). THANK YOU !!
Hey bud, Looks like you have some very exact project requirements. I believe we would have no trouble communicating and making sure what you want is perfect. Please let me know if I Can be of help.