Webcrawler / Spider - Data Extraction

CLOSED
Bids
9
Avg Bid (USD)
$1133
Project Budget (USD)
$750 - $1500

Project Description:
Webcrawler / Spider - Data Extraction

We need a webcrawler / spider that can collect the technical specifications of a particular product

•In essence we will want to input a name and or model number of a particular product and the spider should extract the technical specifications from multiple websites (10-20), you may want to query Google first for the top 10-20 results and then crawl those sites. The number of product could range from 100 to 1000's at a time and we should be able to upload the list with a csv or similar.
•The next step in the process is some level of “fuzzy logic” that will compare the specification names/fields and identify a tolerable level of similarity between the different results and that will be the field label for that particular feature/ specification. i.e. there are generally key technical specifications always mentioned for a particular type of product for example: megapixels for digital cameras.
•The next step is to apply similar same fuzzy logic for the actual specifications themselves as often webmasters don’t always post data accurately or completely and leave some specs out.
•All the data should then be stored in a database that is searchable. The data should be presented in a tabular format.
•Where possible the pdf’s with the technical specifications and or user manuals of the said product, a URL should be supplied by the application, the source URL’s of the data should be included as well
•Our preference is for a web based solution using open source such as php and mySql . The application must be secure and scalable.
•We will require a web based front end to display the results to users, so integration into a CMS such as Wordpress or Joomla would be preferable.

We have many ideas of the logical flow of achieving the above as well as the bigger picture to this entire project, however this will be shared with those short listed as potential suppliers. The code must belong to us and you must be prepared to sign a NDA.

This is the initial project and based on the success of the project there will be ongoing enhancements and features required. Please make sure to read the above properly and send through any questions you have as well as constructive responses.

Skills required:
Data Mining, MySQL, PHP, Software Architecture, Web Scraping
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 1450
in 30 days
Hire phpXpertbd
$ 1250
in 18 days
$ 1500
in 14 days
$ 1000
in 30 days
$ 750
in 7 days
Hire RedCraft
$ 1500
in 20 days
Hire danielricha25
$ 1000
in 10 days
Hire defoladi
$ 750
in 5 days
$ 1000
in 15 days