Need to grab data from web pages and insert into local database.
The source website is www.autoscout24.it.
The script shall fetch all pages from the Vehicles Search Engine ([url removed, login to view],U&sort=price&results=80&page=1&event=pag) then, for each vehicle details page it has to scrape all data found and save it into a single CSV file, comma separated, with 1st line containing Headers, so it can be easily imported into an RDBMS.
The CSV must contain the vehicle AD Unique ID, in order to avoid duplication on our database.
Text can be delimited by single (') or double apexes ("), and needs to be correctly escaped accordingly.
- All other ADs must be removed (no Google AdSense ADs or whatever), CSS, must be removed.
- All vehicles details data must be retrieved.
- For each vehicle AD, pictures must be saved, using the AD Unique ID, and saved into a .zip file.
- No duplicate records must exists.
- The fastest the better, obviously. The script is meant to be run on a daily basis, and possibly in the next future multiple times a day. It mustn't generate any memory leak.
- It must provide CLI parameters to select destination directory of the .CSV + .ZIP files, along with the possibility to be extended to directly insert data into a PostgreSQL RDBMS (PGSql variables, libraries, insert/modify/delete functions must be included).
- The script MUST be completely commented with plenty of details in each and every part, in plain standard English language.
Notes: as long as the project is absolutely meant to be 100% functional (I won't pay for anything who doesn't work exactly as described), it serves for an educational purpose, that's the reason for choosing a so big site and the need of complete and detailed comments.
18 freelancers are bidding on average $263 for this job
Hi I know you asked for Python script. But i am here to offer doing the job as Desktop application in C# as alternative option. If you are interested, i could start preparing a demo for you Thanks