We have two datasets, which are made of text strings. They are similar but not exactly the same. Humans can distinguish which are correct matches.
Project is to build a system that can learn from the matches we approve vs. decline.
We can effectively limit possible matches to about 10 products. Only a few are correct matches. We need a system that can be taught through example which are correct matches and which are not.
Aveeno Products, Shampoo or Conditioner, Hair Styling or Skincare
1.) Aveeno Facial Care Product, excludes trial sizes and cleansing bars
2.) Aveeno Facial Care Product, excludes trial sizes and cleansing bars
3.) Aveeno Positively Ageless Skin Strengthening Body Lotion or Hand Cream (IE)
4.) Aveeno Active Naturals, sunscreen product ets
5.) Aveeno Sun Care Item, TARGET coupon
6.) Aveeno Hand, Body, or Baby Lotion, excludes trial and 2.5 oz
7.) Aveeno Body Wash or Haircare Prodcut, excludes trial
8.) Aveeno Facial Care or Suncare Product, excludes trial and cleansing bars
9.) Aveeno Hair Care, Videos Available to Watch until 9/1/2012
In this example, 1, 2, 3, 6, 7, 8, are matche
9 freelancers are bidding on average $1300 for this job
I am a newly minted PhD in Industrial Engineering with experience in all areas of Data Mining, Statistics, and Data Analysis. Let's start a dialog about your project...
The main problem is to find the correct similarity measures between two text. After that virtually any machine algorithm could give good results. I can help you with this taks.