Write a PDF parser which extracts position and text
€8-600 EUR
Completed
Posted almost 9 years ago
€8-600 EUR
Paid on delivery
I need to index a lot of pdf files and extract various parts of the documents as specific fields.
For this I have previously used pdf2text and used regex to extract part of the text.
In unstructured text this is a tedious process and I would much rather base this process on the actual position of the text.
Therefor I need a commandline software to be run on linux which will parse PDF files and extract not only the text, but also the text position.
I need exact location of every word and or phrase in the text.
There are some libraries in C++ here:
[login to view URL]
And JAVA libraries:
[login to view URL]
Also Python:
[login to view URL]
Output should be in plain text to stout, either as csv (x-position, y-position, word/phrase) or JSON.
The tool should be able to run on a linux terminal with the following setup:
Linux x86_64 x86_64 x86_64 GNU/Linux
PHP 5.3.29 (cli) (built: Aug 20 2014 16:41:34)
Python 2.6.9
OpenJDK Runtime Environment (amzn-2.5.4.0.53.amzn1-x86_64 u75-b13)
I have attached a PDF-file, typical for my scenario.
Hi,
I specialize in creating custom-made tools for PDF files, and I have already developed various tools that do similar things to what you've described in Java. I'm sure that adjusting these tools to achieve your goal should not be a problem.
I'm available to start working on this project right away and quickly deliver to you a professional tool that will do the job.
A little bit about me: I'm an Expert on both the Adobe and AcrobatAnswers forums and have a website dedicated to my custom-made tools for PDF files that you're welcome to check out (Google my handle-name to find it).
You're also welcome to check out my work history on this site and see some of the PDF-related projects I've worked on in the past.
Regards, Gilad (try67)
€500 EUR in 5 days
4.9 (54 reviews)
6.0
6.0
5 freelancers are bidding on average €440 EUR for this job
How many data do you have in pdf and u want to extracts position in text???
i m expert and much interested in that project. could we discus further details???
Bid budget could be renegotiate