Create a scraper / script / crawler to extract product data from an online shop - go through all products - export in csv or excel

IN PROGRESS
Bids
23
Avg Bid (USD)
$205
Project Budget (USD)
$30 - $250

Project Description:
Dear freelancers,

we need an effecient web scraper, which we can run on one of our own servers. WE ARE LOOKING FOR AN EXPERIENCED DEVELOPER - work must be flawless!

Following should be done:

The website to scrape/crawl is: [url removed, login to view]

--> It is an online shop with almost 80k products. The scraper should do the following: It should start with these main top level categories:

[url removed, login to view]
[url removed, login to view]
http://www.dafiti.com.br/bolsas-e-acessorios/
[url removed, login to view]
[url removed, login to view]
http://www.dafiti.com.br/casa-feminina/
[url removed, login to view]

And should scrape EVERY SINGLE product within these categories (as said, something around 80,000 items).

The following information should be exported from EACH product - please use this url to understand different items explained below: http://www.dafiti.com.br/Tenis-Puma-Axis-2-Branco-1359275.html:

1) URL (e.g. "http://www.dafiti.com.br/Tenis-Puma-Axis-2-Branco-1359275.html")

2) Breadcrumbs (e.g. "Início > Masculino > Esporte Masculino > Calçados > Tenis")

3) Brand Name - located above product name (e.g. "Puma")

4) Product Name (e.g. "Tênis Puma Axis 2 Branco")

5) Image URLs --> ALL Images in product page --> USE default resolution (not zoom image) of ~275px × 400px (e.g. "http://static.dafity.com.br/p/Puma-T%C3%[url removed, login to view] ; http://static.dafity.com.br/p/Puma-T%C3%[url removed, login to view] ........ etc etc")

6) Current Price (e.g. "99,90")

7) Old Price - if applicable (e.g. "199,90")

8) Payable rates - if applicable (e.g. "5 x 19,98")

9) Available sizes: (e.g. "38, 39, 40, 41, 42, 43")

10) ALL Available Data in the tab "Detalhes do produto" --> Data here is:

--> A) a short text description AND

--> B) a list with multiple different entries (NOTE: products do not always have all these entries --> compare http://www.dafiti.com.br/Tenis-Puma-Axis-2-Branco-1359275.html versus http://www.dafiti.com.br/Camisa-Polo-Ralph-Lauren-Brand-Preta-1170583.html):

--> List items could be:

- Description (plain text above actual list)
- SKU (e.g. "RA870APM16PQL
- Modelo (e.g. "POLO RALPH LAUREN 89460PRL")
- Material (e.g. "Algodão")
- Composição (e.g. "100% Algodão")
- Cor (e.g. "Preto")
- Lavagem (e.g. "Lavar a mão")
- Medidas (e.g. "Ombro: 17cm/ Manga: 23cm/ Tórax: 116cm/ Comprimento: 76cm")
- Categoria (e.g. "Premium Masculino > Roupas > Pólos > Pólo Manga Curta")

--> That is all data we need for EACH product

***NOTE*** --> We will need to run the script MULTIPLE times per week: SO: The script MUST be effecient an FAST. The data should be extracted and then saved on the server (in csv or any other excel importable format). The script should possible to be run on OUR server.

***NOTE*** --> We are looking for a long term developer - we will not just need ONE script, BUT we will need similar scripts for 10 different online shops. SO: We are looking for somebody to then also develop other scripts.

Please get in touch if you have any questions.

Thank you very much,
Dan

Skills required:
PHP, Software Architecture, Web Scraping
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 157
in 4 days
$ 515
in 10 days
$ 263
in 4 days
$ 147
in 3 days
Hire ecartsolutions
$ 250
in 3 days
$ 155
in 3 days
$ 315
in 3 days
Hire aoefmpes
$ 244
in 7 days
$ 185
in 4 days
$ 147
in 3 days