In Progress

Text Normalizator for 2 languages needed !!!

I would order text normalizer. But for 2 languages at the same time. I attach sample files. They store the very same data in each file but in different language (translations). They are aligned by line. The most important think is to make program normalize both such files at the same time so that output is also alligned 2 files. In output each line should start with Upper case letter and ent with ".\n" sign.

Before normalization text should be cleaned - remove .\/\'/.\./\[]]--- [url removed, login to view] change for ex ? into a - I will provide those rules.

In normalization I will provide dictionary for polish etc. however for english you should do it on your own. You can use and modify opensource solutions like [url removed, login to view] (this one need better dictionary) but remember it should cope with both files at the same time so that output is the same.

If program cannot normalize something because it is unknown it should leave it untouched and generate a log file for external manual normalizer. The external normalizer should be able to read log file a let user manually correct problem and after use correct it program should make correction in correct place in output file.

It should be UTF 8 compatible, and be able to work with big files like even 100MB. Program should also be able to run in mode that will normalize only one language.

Programming language or platform does not matter.

Sample files can be found here [url removed, login to view], you got to click in table on link with PL.

Skills: C Programming, C++ Programming, Perl, Python, Software Architecture

See more: the most needed programming language, programming dictionary, on line programming languages, most needed programming languages, most important programming languages, make your own programming language, english programming language, different programming languages, c programming languages, big data programming languages, big data programming, sign language for work, master solutions, big data programming language, php programming languages, text correction , programming languages, normalization, language dictionary, ENT, found log files, correction read text, php read text file, external blob, text polish

About the Employer:
( 138 reviews ) Czestochowa, Poland

Project ID: #4309424

Awarded to:


Hi I might do this with C++ and mysql installed on your computer Since dictionary data base is too big for something less power I will instruct it how

$400 USD in 28 days
(2 Reviews)

4 freelancers are bidding on average $356 for this job


I am very proficient in c, c++. I have 15 years c++ developing experience now, and I have worked for 5 years, please let expert help you.

$325 USD in 7 days
(19 Reviews)

I did a lot of projects like this ,check pm.

$450 USD in 6 days
(9 Reviews)

Hello, I study in NLP. So I think I can help you.

$250 USD in 2 days
(0 Reviews)