NOTE - This is to determine feasibility of use of FreeLancer - the project will not commence for several months.
We are building a system to answer typed-in questions in the health insurance area.
To do that, we need to correct all the pathologies of typed text - simple spelling mistakes, homonyms, malapropisms, spoonerisms.
A database will need to be developed for straight correction, either a simple mistake like "ameanable" or "terns" for terms", where "terns" is not in the system's dictionary (about 25,000 words).
Other simple errors - "eroors" for "errors", "thepayment" for "the payment", "ter4ms" for "terms", taking into account errors due to nearby keys.
Where there is ambiguity, say "adressing", which could be "a dressing" or "addressing", the alternatives need to be reported, and errors will be fixed at the semantic level.
There are also about 10,000 codes used, such as E8842 - an ICD code. These can have similar errors to other words - they can be run together, have additional or missing characters. A listing of all valid codes would be provided.
1. The person has facility with words, is provided with a test-bed for checking words and phrases, is given samples of questions with errors, builds a lookup table used by the test bed for error substitution, advises on, but does not involve themselves in error correction where ambiguity exists
2. As in 1, except also provides code (in Delphi) for correction of simple errors where no ambiguity exists.
3. As in 2, but involves themselves in methods where ambiguity exists, using patterns or hypothesising to recognise the correct word.
"Facility with words" means that the person should be good at spotting word or phrase errors, and not need to have them pointed out.
We require very high accuracy of output - above 99% - without correction by the user and without the introduction of new errors - above what current spellcheckers provide.