I'm wanting a PDF sorting program written in Java that will:
1. Read PDF files from a configured source directory. (SOURCE_DIRECTORY)
2. Perform OCR on the PDF to convert it into a PDF with embedded text.
3. Rename & move the PDF to a new location based on configured text matching rules.
An example rule might be:
the PDF contains text that matches '<BANK_NAME>' where BANK_NAME = 'National Australia Bank'.
the PDF contains text that matches 'Statement Date: <STATEMENT_DATE>' where STATEMENT_DATE is a formatted date.
the PDF contains text that matches 'Account number: <ACC_NUMBER>' where ACC_NUMBER is any string between 5 & 12 chars long.
place PDF into c:/banking/<BANK_NAME>/<ACC_NUMBER>_<STATEMENT_DATE>.pdf.
I would think that implementing the rule matching using regular expressions would work well, but I'm open to other ideas.
The OCR would need to be done by something freely availably. Perhaps [url removed, login to view]
A GUI for managing the rules would be a nice to have, but initially I'd be happy to edit a configuration file manually.
The program should run as a service and check at configured intervals for any files in the SOURCE_DIRECTORY.
15 freelancers are bidding on average $174 for this job
I can do it for you, I have immense experience with image processing and OCR implementations in C#, Java and Python. Let me know if you're interested. Thanks.
I have been developing the application to scan football players for details and store into database. I believe that my previous experience can get this work completed on time and works well.