Algorithm to match short text strings (equivalent product names from disparate sources)

  • Status Closed
  • Budget N/A
  • Total Bids 23

Project Description

We have an existing MS SQL database which is populated with products retrieved from various sources (portals).

The various portals usually have slightly different names for the products.

E.G

Portal 1:

PortalProduct 1: CocaCola

PortalProduct 2: Manchester United

PortalProduct 3: Maplin Electronics

Portal 2:

PortalProduct 1: Coca-Cola

PortalProduct 2: Man Utd

PortalProduct 3: Maplin Elec Ltd.

- We need to correlate these PortalProducts to a central list of Products.

- One Product can have 1-to-many PortalProducts

- The database is populated with around 10,000 PortalProducts

- The database is not populated with any Products. Products need to be added for each new PortalProduct that doesn't match an existing Product.

This must be implemented as a C# console application so that a) we can see the results when the correlate is run and b) so that we can ultimately run it on a task-schedule.

The developer must develop a text matching algorithm to correlate PortalProduct names with Product names (and create a product where there is no match). The emphasis is on the accuracy of the text matching algorithm. Speed is less important than [url removed, login to view] developer should have a mathematical mind and preferably has experience in developing mathematical text matching algorithms.

Examples of mathematical text matching algorithms that could be used (and potentially refined and combined) to meet our requirements are:

String metric

Locality-sensitive hashing

Needleman–Wunsch algorithm

Smith–Waterman algorithm

Levenshtein distance

Concept Search

Approximate matching with addition of regular expressions ability

Regular expressions for non-fuzzy (exact) matching

Metaphone

Soundex

Agrep

Plagiarism detection

[url removed, login to view]

Ideally the developer will already understand and have used some of these before and able to pick the best algorithm(s) for this application.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online