Store Crawler - Spider

IN PROGRESS
Bids
25
Avg Bid (USD)
$1067
Project Budget (USD)
$750 - $1500

Project Description:
Subject of this project is to develop a program – Store Crawler – which will search the web for retail stores and record next variables:
-Crawl date/time
-URL of the product
-name of the store and its logo
-SKU - product code (if available and can be parsed out of the product page)
-product EAN (if available and can be parsed out of the product page)
-product name
-category and subcategories
-description
-image
-price

Every crawled product must be grouped by SKU, EAN or part of the product name. Grouping algorithm must be very intelligent. It must know which product is similar to other products, so they can be grouped together by product name.
There must be also an option, which tells the crawler which country will be crawled.
The identification whether crawled URL is a retail store or not, must be decided by the robot.

The goal is to build product comparison site.

Your task would be to fill the database with every product searchable through internet with method described below.

For each store custom spider would be made.
Url's of that stores would be scraped from directories or manually entered. Number of retail stores, approx. 3000 or more
Then we need to get all URL's from specific retail store: like sitemap or get it through google index with api search method, using "site:name.com" parameter.

Custom spider GUI would have start and stop field for each needed attribute, which would instruct the spider what to search on specific retail store.

After that, spider/bot would go through all url's of particular store and search for pre entered terms.

Example:
go at http://pocenipc.com/rabljeni-kompleti/rabljen-racunalniski-komplet-compact
We need the price.
Source code tells us, that the price value begins after
td><b>Cena z DDV:</b></td><td>
and ends before
</td>
so crawler would search for that plain text.


Won't work with any type of broker or middleman. You must be the actual coder. We work with individual coders only. We work direct with our vendors.

Details of this small, starter project will be shared inside the PMB with qualified vendors.


Post an offer of $666 and 6 days on this project so I know you've read this and understand English. Price will be set afterwards, when all details will be discussed. Place an offer of anything other than $666 and 6 days and you'll be ignored, I promise.

If you're a real provider, with real experience, post an offer, message and then we'll chat in the PMB.

Skills required:
Data Mining, Data Processing, PHP, Software Architecture, Web Scraping
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 1500
in 20 days
$ 750
in 15 days
$ 1000
in 35 days
$ 1500
in 6 days
$ 1000
in 6 days
$ 800
in 10 days
$ 766
in 6 days
Hire raul27868
$ 750
in 7 days
$ 900
in 6 days
Hire tsuki1704
$ 1666
in 6 days