In a broad sense, my ultimate goal is to create a website for comparing prices and features of products offered by multiple websites that all relate to a specific line of work. I am unsure of the best way to do this… perhaps by web scraping into MySQL, but I’m open to suggestions.
I understand that the best person for the web scraping job might not necessarily be the best person for the web design job, so that’s why I’m splitting it into two (scraping + web design). If you have both superior web scraping skills and an excellent web design portfolio, then I’ll consider hiring you to do the whole thing. But for now, I really need an awesome web scraper.
I will need you to scrape websites of multiple vendors (you will be given a list of 20-50 web sites) then create a database of all the products for sale. For each product, I’ll need as much information as possible, including things like: Manufacturer, Catalog number, Price, Quantity, url, product features, images, etc.
Somehow, I’d like to be able to scrape all the products for sale from one site and then be able to match them up with same/similar products from the other sites (some products will have the exact same manufacturer’s number, others may just be similar with matching features). You will help me determine the best way to go about doing this, perhaps either with selective scraping of the pages or through the use of smart filtering/mining/algorithm after all the pages have been added to the database. For each product category, we’ll probably need to define a set of attributes that can be used to describe the items within that group, so that similar products/substitutes can easily be indentified and compared.
Finally, I will need to be able to re-crawl the websites and update the database a few times a year to reflect new products added and/or price changes of existing products in the database.