Web scraping of printer toner/ink website with import of data to Magento
This project received 13 bids from talented freelancers with an average bid price of $418 AUD.Get free quotes for a project like this
Project Budget$250 - $750 AUD
We intend to scrape product data from a printer toner/ink website. For every product on the website we will create one SKU in our Magento Ecommerce store and import the associated data that has been scraped. The scraped data needs to be cleaned and modified to ensure it meets our Magento Ecommerce store data standards. We will use four CSV files to import the data.
This tasks require that you scrape the data, clean and modify the data to meet our data standards, create four CSV files for import and import the data to our non-production Magento store.
The website we want scrape is organised into this navigation structure:
Brand > Category > Models > Products
There are 16 brand categories currently for the following brands:
Within each brand are one or more of the following categories:
Within each category page is a list of printer models.
Within each model page is a list of products that are for use with that printer model broken into product segments such as below:
Compatible Brand Toner Value Pack
Genuine Brand Toner Value Pack
Compatible Brand Toner
Genuine Brand Toner
Compatible Brand Image Drums
Genuine Brand Image Drums
The final product page has the data that we intend to scrape. Each product page has a unique URL. The same product page will be linked from multiple printer model pages as one product is normally compatible with multiple printers.
We need four CSV files created with the data that is scraped. The four CSV files we need created are:
1) [url removed, login to view]
2) [url removed, login to view]
3) [url removed, login to view]
4) [url removed, login to view]
For each unique product URL we will create one product SKU in our Magento Ecommerce store. This is [url removed, login to view] and product_type.csv.
For each printer part/model we will cross reference the product SKU(s) that are compatible. For example, these parts/models:
SKU1: PartY, ModelZ
… we will cross reference in our database like:
PartY: SKU1, SKU2
ModelZ: SKU1, SKU3
This is [url removed, login to view] and [url removed, login to view]
*** Images need to be downloaded and will be referenced in the [url removed, login to view] import file for import to our Magento Ecommerce store ***
The scraped data needs to be cleaned and modified to ensure it meets our Magento Ecommerce store data standards.
Once complete each CSV file is to be imported to our non-production Magento Ecommerce store.
Our Magento Ecommerce store currently has products from the website we intend to scrape. We will provide a list of product URLs (The unique product URLs from the website that is being scraped) that you do not need to import. After scraping the data from the website you can remove the product URLs that we provide so the same data is not imported twice.
Data Cleaning and Modification
An example of data cleaning and modification is:
“HP LaserJet 1000” would be imported to our Magento Ecommerce store as “HP”,”LaserJet”,”1000”:
The brand and series are both placed into their own separate column. Again, by looking at our existing store data you can clearly see that Laserjet is a series of HP and should be placed into a separate column.
The product attribute “3,500 pages at 5% coverage” would be imported to our Magento Ecommerce store as “Approx. 3,500 pages at 5% coverage”:
By looking at our existing attributes you can see that we use “Approx.” in our data.
It is your responsibility to test each file using our non-production Magento Ecommerce store to ensure all data is able to import successfully.
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online