I am working with an online retail business. for market research purposes, we need to have very up-to-date product price information of key players in the retail market so that we can constantly re-price our products accordingly.
When visiting pixmania-pro.com (a French retail website) if you enter an 8 digit reference number such as 21273466 in the search box, you land directly on the product page corresponding to that reference number.
I need a web based tool where I can upload a CSV list of pixmania pro reference numbers in column A. If a product page was found, the tool will get the current price, but only if the product is “in stock”.
I then download a CSV with the reference numbers of products for whom a product page was found, on column A, and the prices for products in stock on column B. Products not in stock will have a blank cell in column B.
(1) if no product page was found, the tool will exclude that reference from the results altogether
(2) If however a product page was found, but the product was not in stock - the tool will INCLUDE the reference in the results, but leave blank the price field
(3) if a product page was found AND the product is in stock, the tool places the price in column B
I am looking to run this tool INITIALLY for 30,000 references, many of these references will return no result because those products are no longer in the system. Assuming only 10-12 thousand will have a product page and assuming only 7-8 thousand will be in stock, then after I proceed to update my list accordingly, the new list will be about 15,000 references which I will run once daily (= once every 24 hours). To do 15,000 page visits in 24 hours means 625 per hour = 10 page visits per minute. I think this is quite reasonable to go undetected.
Note: these just estimates so I want the tool to be able to handle more than 15,000 for the long run in case my estimates are wrong.
Part of the job is to upload the tool in my HPcloud virtual server and setup the HTTP link for me to use.
I do need to the tool to work long term. Therefore, the job will be considered completed (for releasing the final milestone) only once the tool has worked properly for 30 days using the updated list of approximately 15,000 references. This means that this job will require "optimizations" to be added to the tool if necessary in order to keep the tool ruining in the unlikely event that the tool gets blocked after the first few days.
Fell free to ask me questions. Thanks for bidding!