Develop a scoring algorithm to compare company name and domain.
Factor: Company name token commonness or rarity by ranking company tokens ip population n input file first.
Acronym test: Proctor & Gamble -> [login to view URL] e.g.
Test for string comparisons with company suffixes removed "LLC, Corp, LTD, etc"
Factoring in string length into the equation: AT&T has a low edit score to [login to view URL] but [login to view URL] is not the correct domain. Presbyterean Hospital has a much higher edit score to [login to view URL] but it is correct (in theory). Longer (and rarer) tokens that match have a higher weight than shorter more common tokens that match.