We need a Optical Character Recognition software.
The software should be able to recognize text from images and scanned documents.
Fileformats we initially need support for is
pdf (with embedded images)
On top of the OCR-functionallity we will also need some other features.
Invoice parsing with learning patterns and the ability for manual corrections / input from the user, in case some fields of the invoice can't be parsed.
This manual input should also if possible be "learned" by the software for further scans.
There should be support to add customers in the program, which is which sutomer the documents to be scanned belongs to, and scanning multiple documents at a time (batch-jobs) should be possible.
The data should then be saved for the specific customer in perhaps xml or some other suitable easily parsed format.
We will also be needing support to send parsed data through a website (the website is developed and maintained by us, so you won't have to develop a specific website for this).
The application should be written in C# .NET and we need the source after completion.
The code functionality needs to be very modular so that we can develop our own parsing functions afterwards, and utilize the OCR to scan documents beforehand.
See examples of similar programs
Only serious bids and people that have worked with reading pdfs before, if you do not understand what we want have please send PM before you bid