TMX Cleaner


The application, which will run on Linux Ubuntu should be able to process a series of sequentially regular expression designed to "clean up" our TMX files stored in a specific path. This regex should be added, changed and rearranged by changing the sequence.

I would like to be prepared with the program the first set of regex that performs these tasks:


· Getting started:

check in the archive folder for any files that have the name _clean

If there is some file clean

· Segment

Remove the braces and any content within them

Remove symbols and any content within them

Remove numbers and periods and commas within each area

Remove symbols:% & $ £ "^ ° #

Remove Any URL

Remove Sigle (must be able to update the list)

Srl, spa, sas, spa, Ltd., sas, ltd., (I), (ii), (iii), (a), (b), (c)

Replace "(" with ","

Replace ")" with ","

Replace "(" with ","

Replace "-" with ","

· Translation Unit

Delete the entire unit if the segments that compose differ by more than 50% of the number of words (the control should only be done on segments longer than 3 words)

Eliminate the TUs that have one of the two segments empty

· Final Steps:

control double spaces

tmx rename the file by adding the file name _clean

//////////// TMX FILE example (normally they are very numerous TU) ///////////





Xxxxxxx Reports Third Quarter 2012 Financial Results

Xxxxxxx pubblica i risultati finanziari del terzo trimestre 2012



3Q 2012 Net Operating Income of $128.2 million, $[url removed, login to view] per diluted share
3Q 2012 Net Income of $126.3 million, $[url removed, login to view] per diluted share

Utile operativo netto T3 2012 = $ 128,2 milioni, $ 1,55 per azione diluito
Utile netto T3 2012 = $ 126,3 milioni, $ 1,52 per azione diluito



Net income increased to $126.3 million, or $[url removed, login to view] per diluted share, compared to third quarter 2011 net income of $74.0 million, or $[url removed, login to view] per diluted share.

L'utile netto è aumentato fino a 126,3 milioni di dollari, pari a 1,52 dollari per azione diluiti, rispetto all'utile netto del terzo trimestre 2011, che si collocava a 74,0 milioni di dollari, pari a 0,77 dollari per azione diluiti.

Skills: Linux

See more: tmx cleaner, regular expression words, regular expression or example, regular expression in linux example, regular expression in linux, regular expression in c, regular expression for words, regular expression example, regular expression 0, regex example, example of regular expression, c regular expression example, compose a content, tu tus, in sas, SAS, Cleaner, url cleaner, rename url, ubuntu application, linux file share, remove content url, name spa, financial translation, spa application

About the Employer:
( 1 review ) milano, Italy

Project ID: #4369777

3 freelancers are bidding on average $84 for this job


Hello. I can help you.

$66 USD in 8 days
(57 Reviews)

I can do this easily

$88 USD in 2 days
(1 Review)

I have more than 20 years of programming experience (C,C++,Java,...) and more than 15 years of experience with Linux, shell scripting and regular expressions. Depending on the size of your files and your requirements More

$99 USD in 5 days
(0 Reviews)