Rapidminer Ninja wanted / Webscraping using Rapidminer

** Your knowledge/skills


- You are an experienced user of Rapidminer 5.2

- You have already a previous experience of successful webscraping using Rapidminer 5.2

** Your work habits


- You respect the deadlines (you will proactively report any hurdles)

- You will answer emails within 24 hours

- You will not outsource the job, fully or parts of it

** Your personality

- You don’t hesitate to provide input/ideas that could bring added value to the project

- You are interested in a long term collaboration on further webscraping projects.

** Your task will be

Your mission is to create a webscraping process in Rapidminer where the input is a set of keywords, and the output is a unique Excel spreadsheet (.xls or .xlsx).

- Let’s choose the example of the set of keywords: US “trade balance” (trade balance is between quotes)

- The process will search the 9 following websites for these keywords

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

- For each website, the process will retreive the 3 (default value) most recent articles. This number must be configurable by website, ie. we may configure 5 articles for the NY Times but only 2 for the WSJ.

- The process will save the content of each article (only the article, not the full webpage) in an Excel spreachsheet where the columns are ordered as following:

+ Column 1: publishing date of the article

The format of the date is different on the websites. For example:

On Reuters : Tue Sep 20, 2011 11:40pm EDT

On Bloomberg : Sep 18, 2011 9:00 PM GMT+0200

On Businessweek : August 04, 2011, 4:45 PM EDT

On WSJ : September 27, 2011, 7:30 PM IST

On FT : September 11, 2011 4:24 pm


+ Column 2: direct link to the article on the website (the source webpage that has been processed)

+ Column 3: title of the article (without html tags)

+ Column 4: content of the article (without html tags)

- The file “[url removed, login to view]” will be saved under c:\rapidminer\

** You will deliver


- You will test the process before delivery in order to ensure it works as described

- You will provide the .RMP file.

Skills: Web Scraping

See more: rmp rapidminer, webscraping using rapidminer, work habits, where to outsource articles, website scraping projects, web scraping uk, web scraping process, web scraping online job, webscraping job, web online outsource html, wanted format, september 7 personality, scraping web for ideas, scraping web content, parts link 24, outsource your excel skills, outsource works, outsource website html, outsource website content, outsource task, outsource projects online, outsource online work, outsource online excel, outsource job search, outsource html within html

About the Employer:
( 0 reviews ) Meilen, Switzerland

Project ID: #1457542

2 freelancers are bidding on average $49 for this job


Please check PMB

$50 USD in 1 day
(0 Reviews)

Higher quality & good reputation service always!!! Account Creating section $3 per 1k hotmail account (non verified) $3 per 1k yahoo account Bulk Email sending Service We highly Exprience in Email Sending Sect More

$47 USD in 1 day
(0 Reviews)