You have chosen to sponsor your bid up to a maximum amount of .
I would order text normalizer. But for 2 languages at the same time. I attach sample files. They store the very same data in each file but in different language (translations). They are aligned by line. The most important think is to make program normalize both such files at the same time so that output is also alligned 2 files. In output each line should start with Upper case letter and ent with ".\n" sign.
Before normalization text should be cleaned - remove .\/'/.\./\]--- etc.and change for ex ? into a - I will provide those rules.
In normalization I will provide dictionary for polish etc. however for english you should do it on your own. You can use and modify opensource solutions like https://github.com/soshial/text-normalization/blob/master/README (this one need better dictionary) but remember it should cope with both files at the same time so that output is the same.
If program cannot normalize something because it is unknown it should leave it untouched and generate a log file for external manual normalizer. The external normalizer should be able to read log file a let user manually correct problem and after use correct it program should make correction in correct place in output file.
It should be UTF 8 compatible, and be able to work with big files like even 100MB. Program should also be able to run in mode that will normalize only one language.
Programming language or platform does not matter.
Sample files can be found here https://wit3.fbk.eu/mt.php?release=2012-03-test, you got to click in table on link with PL.