We are interested in extract text from PDF special files ( image files) . OCR is necessary. The PDF files are obtained from a .xps image using a virtual printer ( CUTE PDF). As you will see on the example, each page has two columns. Using ABBYY for example some times the columns are mixed. We are interested in scapping the text without mixing the columns. We will need a automatization process and a long therm collaboration because there are about 30-40 from this documnents daily.
So, we need to extract correctly text from this PDF . The language is Romanian. The text can be saved in one file/ document.
Please find out a PDF for text as an example.