create mapreduce program and run it on amazon aws linux nodes

Build your own Hadoop AMI, starting from the Amazon Linux AMI ([url removed, login to view]). You have to use latest stable Hadoop release. You are required to store this AMI in S3, and its name must include your last name. This AMI will be tested with the application built for task 2. However, if your AMI doesn’t work you are allowed to use one of the pre-built Hadoop AMIs for task 2.

Write a Hadoop/Yarn MapReduce application that takes as input the 50 Wikipedia web pages dedicated to the US states (we will provide these files for consistency) and:

Computes how many times the words “education”, “politics”, “sports”, and “agriculture” appear in each file. Then, the program outputs the number of states for which each of these words is dominant (i.e., appears more times than the other three words).

Identify all states that have the same ranking of these four words. For example, NY, NJ, PA may have the ranking 1. Politics; 2. Sports. 3. Agriculture; 4. Education (meaning “politics” appears more times than “sports” in the Wikipedia file of the state, “sports” appears more times than “agriculture”, etc.)

INPUT FILE IS GIVEN - [url removed, login to view]

Skills: Amazon Web Services, Big Data Sales, Hadoop, Java, Linux

See more: aws emr applications, aws emr create-cluster, emr instance types, emr cluster configuration, emr instance configuration, aws emr bootstrap script, emr limits, aws emr cli, run mapreduce program java program, amazon aws create website, amazon aws api script create vpc, accounting program amazon aws, linux server admin service amazon aws, hbase hadoop mapreduce php thrift amazon aws ec2, rar run program create sfx archive run program sfx options setup program

About the Employer:
( 0 reviews ) United States

Project ID: #16354070

12 freelancers are bidding on average $297 for this job

$882 USD in 3 days
(13 Reviews)

Dear Customer, My name is Yuriy Tumakha. I am interested in your AWS Hadoop project. I am Senior Scala/Java Developer with 14 years of experience. You can see my code examples on GitHub [url removed, login to view]

$350 USD in 7 days
(16 Reviews)

Hello there, We are a team of expert Big Data developers with more than 10 years of rich inductry experience & have succesfully delivered multiple projects in the past like a)recipe recommendation b)movie recom More

$277 USD in 3 days
(7 Reviews)

Hello, I have extensively worked in map reduce progran in Python Scala and Java. Can we talk directly in the chat? Thanks!

$200 USD in 3 days
(18 Reviews)

Hi, I am an IT specialist and data scientist and thus have the skills required for this job. I have experience building AWS EC2 machines from AMIs as well as installing and configuring Hadoop. Finally, I have experi More

$222 USD in 5 days
(5 Reviews)

NOTE: Most of the requirement of your project scope is already completed by us and we have demo for you as well. We are Amazon MWS / Ebay API experts and completed so many projects using its API I have ready to use More

$155 USD in 3 days
(1 Review)

Hi, I have more than 3+years of experience in hadoop technologies like MapReduce,HDFS, Spark, sqoop etc. I can write the mapreduce program according to your requirements and I can deploy on amazon aws Contact me fo More

$250 USD in 3 days
(7 Reviews)

Hi, We are a Team of Amazon certified Solutions Architects, we have more then +3 years experience with amazon AWS and more than +5 years as Linux SysOps. We can help you with this Please let me know if you need More

$250 USD in 5 days
(1 Review)

Hey, Can we start project ASAP and will complete within 3 working days so kindly suggest time to connect with you.

$222 USD in 3 days
(0 Reviews)

Languages: JAVA. Java/J2EE: Core JAVA,JAVAFX, Advanced JAVA, Servlet, JSP, JSTL, EJB, JDBC, Junit, Web Services, XML, XSD, JAX-RS, DOM, SAX, Multithreading, JTA, Custom Tags, JPA API’s. Web Technologies: Html, DHTML More

$255 USD in 7 days
(0 Reviews)

A proposal has not yet been provided

$277 USD in 3 days
(0 Reviews)

I've been working in a big data company for 2 years. I'm very good at hadoop/spark. I know how to optimize difficult map reduce jobs. What's more, I'm good at system-level optimization. I know how to analyze IO/network More

$222 USD in 1 day
(0 Reviews)