You have chosen to sponsor your bid up to a maximum amount of .
Need to grab data from web pages and insert into local database.
The source website is www.autoscout24.it.
The script shall fetch all pages from the Vehicles Search Engine (http://veicoli.autoscout24.it/?atype=C&cy=I&zipc=I&zipr=200&ustate=N,U&sort=price&results=80&page=1&event=pag) then, for each vehicle details page it has to scrape all data found and save it into a single CSV file, comma separated, with 1st line containing Headers, so it can be easily imported into an RDBMS.
The CSV must contain the vehicle AD Unique ID, in order to avoid duplication on our database.
Text can be delimited by single (') or double apexes ("), and needs to be correctly escaped accordingly.
- All other ADs must be removed (no Google AdSense ADs or whatever), CSS, must be removed.
- All vehicles details data must be retrieved.
- For each vehicle AD, pictures must be saved, using the AD Unique ID, and saved into a .zip file.
- No duplicate records must exists.
- The fastest the better, obviously. The script is meant to be run on a daily basis, and possibly in the next future multiple times a day. It mustn't generate any memory leak.
- It must provide CLI parameters to select destination directory of the .CSV + .ZIP files, along with the possibility to be extended to directly insert data into a PostgreSQL RDBMS (PGSql variables, libraries, insert/modify/delete functions must be included).
- The script MUST be completely commented with plenty of details in each and every part, in plain standard English language.
Notes: as long as the project is absolutely meant to be 100% functional (I won't pay for anything who doesn't work exactly as described), it serves for an educational purpose, that's the reason for choosing a so big site and the need of complete and detailed comments.