Make an Elastic Mapreduce application to search a large dataset

Closed

Description

We are conducting early-stage research for a business idea relating to Wordpress.

We have identified a pre-crawled corpus of web documents here:

[url removed, login to view]

We are seeking someone who can make an Elastic Mapreduce application to search the corpus using techniques explained here: [url removed, login to view]

We want to grep for the URLs that are Wordpress sites (perhaps by searching for the text wp-content in the HTML code or some other similar technique).

Once we have the large list of Wordpress sites there will be further jobs available but the first step is producing this list.

Skills: Amazon Web Services, Big Data, Data Mining, Software Architecture

See more: wp services, jobs search sites, jobs search com, common services architecture, common jobs, business services jobs, amazon web services jobs, amazon jobs, amazon com jobs, to search, jobs in architecture, aws jobs, amazon jobs application, wordpress aws, mapreduce, elastic search, crawl a we, amazon research, Amazon web architecture, application crawl, mapreduce jobs, corpus, common crawl, crawl web application, elastic search architecture

Project ID: #4849737

9 freelancers are bidding on average $609 for this job

bradaric

Hi, I've reviewed all the information I could find on the Common Crawl dataset and I can create the script needed to filter out all WordPress sites. The script itself is not a problem. I'm just not sure how long wil More

$700 USD in 2 days
(83 Reviews)
6.7
alphaedge999

Interested in this project.

$1000 USD in 3 days
(1 Review)
4.6
tcly315

I have currently working on some EMR Projects. Let me help you :)

$666 USD in 3 days
(1 Review)
4.0
shomratkutub

The jinee of The Magic Lamp of Aladin is here as DEVELOPER to handle your project, check PM..

$250 USD in 3 days
(4 Reviews)
3.6
gauravkumar37

I am a professional Big Data developer/scientist. I have completed many projects on freelancer. I can complete the job for you easily since I have handled very large scale Big Data projects spanning billions of rows ou More

$765 USD in 5 days
(20 Reviews)
3.5
helmot

Let's start!

$499 USD in 3 days
(11 Reviews)
3.5
topmoose

Please see PM.

$650 USD in 3 days
(0 Reviews)
0.0
JuventusMaximus

Please see private message.

$700 USD in 7 days
(0 Reviews)
0.0
jeremtank

I can do this for you . PM

$250 USD in 5 days
(0 Reviews)
0.0
computerpsycho

Hello Sir/Madam I read your whole description but didn't get what actually you want to extract. If you can explain me in PMB, I can do all kind of web scraping, I have 3+ years of web scraping. Thanks and Regards

$250 USD in 1 day
(0 Reviews)
0.0