Building a Prototype Environment for Talend and HDP + Crawler

IN PROGRESS
Bids
4
Avg Bid (USD)
$727
Project Budget (USD)
$300 - $800

Project Description:
We are looking for a company or group or individual who has experience in Talend and HDP. (Hortonworks Data Platform)
This project is prototype project to verify the technical issues before the real project comes.
Therefore, if the result of this project is good, we will hire you for the real project also.
If you have experience of Talend, HDP and JAVA, this project is NOT hard to perform.
Only experienced candidates will be welcomed.  


1. By user-defined keywords, extract specific data(mainly, area) of a specified website and store the data by a text file or db format on my server.
2. By a scheduled Job, inject the extracted data into HDP. And the job must be designed by Talend Open Studio.
- The process condition for the job will be defined by discussion as you design the job. 
- Create 3~5 tables on a mySQL DB for the result
- The transfer from HDP to mySQL DB is performed by Talend. 
- The design of the tables will be discussed as you reach that step.
** Reference : The configuration of the prototype system is attached.


1. First, you should install all the components on my servers by remote. (TeamViewer will be used) 
2. You should work on my servers from the beginning. 
3. All the installation steps must be transferred to my staff by following your way.( Skype will be used for this communication)
: This is the key principle condition for payment.


1. Linux Web Crawler : you should prepare a proper crawler to satisfy the needs, below.
: This crawler must be offered to us. and we can ask some additional customizing needs for the project propose. (discussion needed)
- Searching specified Web site by keywords (including the subdirectories)
- Extraction should be repeated by time setting.
- Extracting Items
a. URL
b. meta tag(title, description, keyword) 
c. plain text between to tag
d. page size
e. last modified date value

2. Talend Open Studio for Big data (free version)
- Already Installed on my machine. but, if you need, you can use user own.
- But, the result project files(Talend project files) must be offered us.

3. HDP  (free version)
- Already Installed on my machine. and you should work on it during this project.
- You can reinstall it or change the configuration of it if you need.
- All the changed and modified history must be transferred to us.


** This project will start by this Wednesday(22th/June), at 9:30am (GMT+9).
** The bid will be closed until this Tuesday(21th/June), at 11pm(GMT+9).
** The desired due date for completing this project is this Sunday, at 2pm(GMT+9)

Skills required:
Engineering, Hadoop, Java, MySQL
Additional Files: Talend+Project.pdf
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 800
in 5 days
Hire reachusSP
$ 777
in 3 days
$ 666
in 20 days
$ 666
in 11 days