Web scraping of printer toner/ink website with import of data to Magento

Closed

We intend to scrape product data from a printer toner/ink website. For every product on the website we will create one SKU in our Magento Ecommerce store and import the associated data that has been scraped. The scraped data needs to be cleaned and modified to ensure it meets our Magento Ecommerce store data standards. We will use four CSV files to import the data.

This tasks require that you scrape the data, clean and modify the data to meet our data standards, create four CSV files for import and import the data to our non-production Magento store.

Navigation Structure:

The website we want scrape is organised into this navigation structure:

Brand > Category > Models > Products

Brand Page:

There are 16 brand categories currently for the following brands:

Brand

Canon

Dell

Epson

HP

Konica Minolta

Kyocera

Lanier

Lexmark

OKI

Panasonic

Ricoh

Samsung

Sharp

Toshiba

Xerox

Within each brand are one or more of the following categories:

Ink Cartridges

Toner Cartridges

Thermal Rolls

Category Page:

Within each category page is a list of printer models.

Models Page:

Within each model page is a list of products that are for use with that printer model broken into product segments such as below:

Compatible Brand Toner Value Pack

Genuine Brand Toner Value Pack

Compatible Brand Toner

Genuine Brand Toner

Compatible Brand Image Drums

Genuine Brand Image Drums

Product Page:

The final product page has the data that we intend to scrape. Each product page has a unique URL. The same product page will be linked from multiple printer model pages as one product is normally compatible with multiple printers.

We need four CSV files created with the data that is scraped. The four CSV files we need created are:

1) [url removed, login to view]

2) [url removed, login to view]

3) [url removed, login to view]

4) [url removed, login to view]

For each unique product URL we will create one product SKU in our Magento Ecommerce store. This is [url removed, login to view] and product_type.csv.

For each printer part/model we will cross reference the product SKU(s) that are compatible. For example, these parts/models:

SKU1: PartY, ModelZ

SKU2: PartY

SKU3: ModelZ

… we will cross reference in our database like:

PartY: SKU1, SKU2

ModelZ: SKU1, SKU3

This is [url removed, login to view] and [url removed, login to view]

*** Images need to be downloaded and will be referenced in the [url removed, login to view] import file for import to our Magento Ecommerce store ***

The scraped data needs to be cleaned and modified to ensure it meets our Magento Ecommerce store data standards.

Once complete each CSV file is to be imported to our non-production Magento Ecommerce store.

Existing Products

Our Magento Ecommerce store currently has products from the website we intend to scrape. We will provide a list of product URLs (The unique product URLs from the website that is being scraped) that you do not need to import. After scraping the data from the website you can remove the product URLs that we provide so the same data is not imported twice.

Data Cleaning and Modification

An example of data cleaning and modification is:

“HP LaserJet 1000” would be imported to our Magento Ecommerce store as “HP”,”LaserJet”,”1000”:

The brand and series are both placed into their own separate column. Again, by looking at our existing store data you can clearly see that Laserjet is a series of HP and should be placed into a separate column.

The product attribute “3,500 pages at 5% coverage” would be imported to our Magento Ecommerce store as “Approx. 3,500 pages at 5% coverage”:

By looking at our existing attributes you can see that we use “Approx.” in our data.

Important:

It is your responsibility to test each file using our non-production Magento Ecommerce store to ensure all data is able to import successfully.

Skills: Data Entry, Excel, Web Scraping, Web Search

See more: web navigation 2014, use of data structure, scraping the web, scraping data from web database, scraping data from the web, samsung website, samsung web, need of data structure, model your brand, list of data structure, list in data structure, list data structure, linked list in data structure, linked list data structure, ecommerce website for 500, data structure list, data structure linked list, data structure example, data entry standards, data entry responsibility, data entry magento excel, c using data structure, associated brands, data entry test 5.5 1, xerox data entry

Project ID: #5819420

13 freelancers are bidding on average $418 for this job

diamond247

Hello Sir, We are a well built set up with excellent skilled operator with lot of experience in this segment/skill,have complete more than 200 similar job, i have gone through your project description, its really a More

$260 AUD in 10 days
(121 Reviews)
6.7
SigmaVisual

Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of our experience: Amazon/Ebay Bots: http://sigma-dns.sigmavirtual.com/PDemo1/Am More

$263 AUD in 5 days
(52 Reviews)
6.5
mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$555 AUD in 6 days
(104 Reviews)
6.5
pandey2008

Hello sir,I have 8 member team and good experience and we can start the work right now and also all communication and work will be high quality. all work will be on my office without any delay.thanx

$263 AUD in 4 days
(136 Reviews)
6.0
dpune

Hi, I have more than 14 years of exp and I am expert in this kind of work. I have completed more than 200 projects. Please look at the feedback left by my employer to know more about my work. Waiting for your positive More

$700 AUD in 25 days
(91 Reviews)
5.7
shajijohnc

Hi, I have experience in data extraction as you can see in my profile. I have previously done data extraction + import into magento before. Happy to discuss further and I can also get few sample entries before s More

$277 AUD in 10 days
(32 Reviews)
4.9
ganesharena

Dear, I am interested in your project and ready to start working on it. I've done similar project before expect I did not make an upload test nor uploading scraped data. Kindly provide more details. Thank you More

$390 AUD in 10 days
(1 Review)
4.6
SuiGenSolutions

Hello Sir, We've done a number of web scraping projects for our clients. We have scraped many directory websites including yellowpages, yelp and e-commerce websites including amazon, walmart etc and many more. We c More

$250 AUD in 5 days
(2 Reviews)
1.8
dennisochei

Hi! I'm a Duke University graduate with degrees in Computer Science and Neuroscience. PM me so we can talk. Thanks for your consideration!

$250 AUD in 3 days
(0 Reviews)
0.0
hottmex

Appears to be a repost of the same project that you awarded to another bidder. I assume things didnt work out? Let me know, we can wrap this up in 1 day. Regards Brandt

$555 AUD in 10 days
(0 Reviews)
2.9
jadesmail86

Hi there, I am available to start immediately and will always perform each and every job to the highest standard possible.

$555 AUD in 10 days
(0 Reviews)
0.0
Reliancewebsol

A proposal has not yet been provided

$555 AUD in 10 days
(0 Reviews)
0.0
ankityadav712

I am expert in web scraping and collecting data from web, I can scrape any website having JavaScript/AJAX, using proxies, solving CAPTCHA's, and containing millions of records. I can provide scraped data in whatever fo More

$555 AUD in 10 days
(0 Reviews)
0.0