Find Jobs
Hire Freelancers

Write a PDF parser which extracts position and text

€8-600 EUR

Completed
Posted almost 9 years ago

€8-600 EUR

Paid on delivery
I need to index a lot of pdf files and extract various parts of the documents as specific fields. For this I have previously used pdf2text and used regex to extract part of the text. In unstructured text this is a tedious process and I would much rather base this process on the actual position of the text. Therefor I need a commandline software to be run on linux which will parse PDF files and extract not only the text, but also the text position. I need exact location of every word and or phrase in the text. There are some libraries in C++ here: [login to view URL] And JAVA libraries: [login to view URL] Also Python: [login to view URL] Output should be in plain text to stout, either as csv (x-position, y-position, word/phrase) or JSON. The tool should be able to run on a linux terminal with the following setup: Linux x86_64 x86_64 x86_64 GNU/Linux PHP 5.3.29 (cli) (built: Aug 20 2014 16:41:34) Python 2.6.9 OpenJDK Runtime Environment (amzn-2.5.4.0.53.amzn1-x86_64 u75-b13) I have attached a PDF-file, typical for my scenario.
Project ID: 7783748

About the project

5 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
Hi, I specialize in creating custom-made tools for PDF files, and I have already developed various tools that do similar things to what you've described in Java. I'm sure that adjusting these tools to achieve your goal should not be a problem. I'm available to start working on this project right away and quickly deliver to you a professional tool that will do the job. A little bit about me: I'm an Expert on both the Adobe and AcrobatAnswers forums and have a website dedicated to my custom-made tools for PDF files that you're welcome to check out (Google my handle-name to find it). You're also welcome to check out my work history on this site and see some of the PDF-related projects I've worked on in the past. Regards, Gilad (try67)
€500 EUR in 5 days
4.9 (54 reviews)
6.0
6.0
5 freelancers are bidding on average €440 EUR for this job
User Avatar
How many data do you have in pdf and u want to extracts position in text??? i m expert and much interested in that project. could we discus further details??? Bid budget could be renegotiate
€500 EUR in 30 days
4.9 (8 reviews)
2.1
2.1

About the client

Flag of SWEDEN
Stockholm, Sweden
5.0
27
Payment method verified
Member since Dec 2, 2014

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.