I need a PHP script that calculates the "edit distance" between the contents of two text files, where "edit distance" is defined as the minimal number of word insertions, word deletions, word substitutions and words transpositions (these weighted x0.5) needed to transform the first one into the second one.
Examples of program input (text 1, text 2, i.e. the content of the two text files) and output (edit distance):
text 1: "THIS APPLE IS RED"
text 2: "THIS IS RED"
edit distance = 1
(1 word deletion)
text 1: "THIS IS RED"
text 2: "THIS APPLE IS RED"
edit distance = 1
(1 word insertion)
text 1: "THIS APPLE IS RED"
text 2: "THIS CHERRY IS RED"
edit distance = 1
(1 word substitution)
text 1: "THIS APPLE IS RED"
text 2: "THIS RED IS APPLE"
edit distance = 0.5
(1 word transposition)
text 1: "THIS APPLE IS RED"
text 2: "APPLE RED"
edit distance = 2
(2 word deletions)
text 1: "THIS APPLE IS RED"
text 2: "THIS RED APPLE IS GOOD"
edit distance = 2
(1 word insertion + 1 word substitution)
Requisites:
- it must be fast (<1 minute to calculate the edit distance between two completely different 100KB text files)
- it must work with any text file of any lenght (<200KB)
Escrow offered.
Demo appreciated.
_____
Note: this "edit distance" that I want is similar to the Damerau-Levenshtein distance whose algorithm (in C) is reported here: [login to view URL], with main the difference that this counts the number of diff. characters while I want to count the number of diff. words.
Hi.
Very interesting project for me, would be glad to do it.
I have a lot of experience in PHP, more than 8 years.
Will provide clear written fast script.
Thank you.