We are looking for a rockstar web scraper that is able to complete the following project.
I need product info scraped from an online toy store. The final result needs to be an Excel file with data sorted into each column.
The data that needs to be retrieved is seen below (these will also be the column heads in the final Excel sheet):
1. product title
5. discounted price (if available)
6. main category
7. sub categories (the concatenated breadcrumb titles (comma separated))
8. HTTP Link to the product page
The website is : [url removed, login to view]:Utility3:See-All-Categories:Home-Page
All products in the following categories need to be scraped and put into the Excel file.
Great Deals Store
Action Figures & Hero Play, NERF Blasters
Arts & Crafts, Educational Toys, Books
Baby, Toddler & Preschool Learning Toys
Bikes, Scooters & Ride-Ons
Building Blocks, LEGO Toys
Cooking for Kids, Play Kitchen Sets
Dolls, Dress Up, Stuffed Animals, Tween
Electronics, Tech Toys, Movies, Music
Games, Puzzles, Boutique Toys
Outdoor Play, Kids Sports, Swimming Pools
Party Supplies & Candy Shop
Vehicles, Trains, RC
Video Games Holiday
Shops Kids' Clothes
I'm expecting the total row count for Excel to be around 10k though that's a simple guess on my part.
I don't care if you use Java or Python. I need the final code pushed into a Github repo under my control.
We expect the project to be completed in two days max. When submitting your bid please state 'I am the one and will be able to complete the project in two days'. Three days is acceptable but please state that if that is the case. We might need recurring scraping of the same categories on the online store. Therefore, please state your initial bid price to complete the entire project, and also include separately, the cost to rerun the web scrape.
Thank you for your time, we look forward to your bid.
edit: the total product count seems to be around 100k items.