Closed

Python data grabbing from web pages

This project received 20 bids from talented freelancers with an average bid price of $271 USD.

Get free quotes for a project like this
Employer working
Project Budget
$30 - $250 USD
Total Bids
20
Project Description

Need to grab data from web pages and insert into local database.
The source website is www.autoscout24.it.
The script shall fetch all pages from the Vehicles Search Engine ([url removed, login to view],U&sort=price&results=80&page=1&event=pag) then, for each vehicle details page it has to scrape all data found and save it into a single CSV file, comma separated, with 1st line containing Headers, so it can be easily imported into an RDBMS.
The CSV must contain the vehicle AD Unique ID, in order to avoid duplication on our database.
Text can be delimited by single (') or double apexes ("), and needs to be correctly escaped accordingly.

Requirements:
- All other ADs must be removed (no Google AdSense ADs or whatever), CSS, must be removed.
- All vehicles details data must be retrieved.
- For each vehicle AD, pictures must be saved, using the AD Unique ID, and saved into a .zip file.
- No duplicate records must exists.
- The fastest the better, obviously. The script is meant to be run on a daily basis, and possibly in the next future multiple times a day. It mustn't generate any memory leak.
- It must provide CLI parameters to select destination directory of the .CSV + .ZIP files, along with the possibility to be extended to directly insert data into a PostgreSQL RDBMS (PGSql variables, libraries, insert/modify/delete functions must be included).
- The script MUST be completely commented with plenty of details in each and every part, in plain standard English language.

Notes: as long as the project is absolutely meant to be 100% functional (I won't pay for anything who doesn't work exactly as described), it serves for an educational purpose, that's the reason for choosing a so big site and the need of complete and detailed comments.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online