Closed

Big data entity resolution in NoSQL database

There’re several collections storing documents containing company entity information. These collections record different information relevant to company entities, such as executives, accountants, products and investment. Based on the type of information gets stored, individual collections are different by their field name and structure, but also share certain overlaps, such as company name, geo location, contact info, industry keyword and official website. Now we need to link documents about the same entity across all different collections, the obstacles we’ve encountered are: 1. Since the source of the data are different, company names belongs to the same entity appeared differently across collections. Since some names are in full name, some are in abbreviation, some are in Pinyin and some are simply initials of English name, it’s hard to completely match documents on the same entity. 2. Different collections contain different fields, and not all collections have contact information and website as fields. All collections may only share company name as the only common field, hence it’s hard to establish a unified matching rule.

If we are using the Apache Spark framework to solve this entity resolution problem, what algorithms offer the best performance in terms of precision and feasibility? The largest collection has size around 20,000,000 documents.

We need to find an outsource specialist who has done projects or experience in:

1. Big data entity resolution in NoSQL database

2. Over two years experience in Apache Spark and MongoDB

Attachment

Skills: Big Data Sales, NoSQL Couch & Mongo, Spark

See more: database access ready data, extract images stored ms access database long binary data format, extract data big files, database testing conversion data, database rss feed data, database building freelance data entry, database function extract data xml file, software write mq4, software write chip epson, useful software write book, php excel file load database update insert data, mdb database password extract data, software write web specs, database twitter follower data, joomla database forms collect data, database companies looking data entry workers, free software write user guide, software write edid, free software write company profile, software write websites idea, software write book images, software write books, software write protection, online database free fill data, free software write book

About the Employer:
( 0 reviews ) China

Project ID: #14290277

5 freelancers are bidding on average ¥1195 for this job

langlangFan

Hello. Good to see another serious posting. I don't usually look for new clients but I happened to see your job post and I wanted to contact you. I’ve read your brief and I could absolutely help you with your goal. More

¥1244 CNY in 3 days
(1 Review)
3.2
¥240 CNY in 3 days
(0 Reviews)
0.0
¥1888 CNY in 3 days
(0 Reviews)
0.0
MetaoriginLab

Hey We are a team of Technical Developers and have got expertise in such stuff. Ping me if you are looking for a quick resolution

¥1248 CNY in 7 days
(0 Reviews)
0.0
¥1244 CNY in 3 days
(0 Reviews)
0.0
fatimayes

Hi, this is Fatima. I have been researching and have found two native Spark solutions for your problem, plus Duke. It will work. Best regards. Relevant Skills and Experience I have been working with Spark and Scala fo More

¥2000 CNY in 14 days
(0 Reviews)
0.0