421589 web scraper php script

In Progress Posted Jun 17, 2010 Paid on delivery
In Progress Paid on delivery

we are looking for web scraper php script

which will scan web sites products, get the data and then insert and update it into a MySql DB.

The scraper shall do the following steps, once a day as a cron job:

1. Scan the site and prepare a list of all his products name or SKUs.

2. Compare the list with existing DB.

3. Adding new products to the DB.

4. Download all product pictures into one directory when each picture name based on the SKU number and the picture number.

5. Update DB fields that had been changed for each existing product.

6. Translate all text fields with Google translator to six languages and add translation to the DB.

7. Changing the IP or the Proxy server in daily basses, in order to avoid blocking from scanned site administrator.

8. Generating Category code for each new category.

The DB should contain the following fields:

Category name

category code

category parent code

Product Title

SKU

Price (USD)

Quantity

Shipping costs (free shipping or not)

Overview \ Description (Text)

Specification

Dimensions

Shipping Weight

Related products list

Status (in stock, out of stock)

Delivery time (how many days to delivery, if maintained by the site)

Product original web page link

Added to DB Date/Time

Updated changes in the DB Date/Time

Number of total Small pictures

Number of total Large pictures

Small size pictures path

Large size pictures path

All pictures shall be added to a directory when each picture has name based on SKU and picture number.

the script will be excuted on the following sites:

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view] - only free shipping products

[url removed, login to view]

[url removed, login to view] - only free shipping products,

Each site should be a different script file so it can be easy to add and remove scanned sites.

MySQL Odd Jobs PHP Shell Script Web Scraping

Project ID: #2167457

About the project

Remote project Active Jul 11, 2012