In Progress

Information extraction from bookmarked PDF using Java

For this small development project the following skills are absolutely essential:

- Java programming skills

- Good understanding of PDF file structure

- Experience with manipulating PDF files using Java

- Experience with using an opensouce PDF Java library such as PDF Clown, iText, PDFTextStream or others

The objective is to create a core component of an automated solution which takes bookmarked PDF files and extracts the numbered, itemized paragraphs together with the text outline as classifier into a machine readable format (e.g. CSV, XML, MS Access table).

The solution needs to work only on a specific set of PDF files which all use the same document structure and all of which are bookmarked. Two sets of files are attached. The first pack ("Requirements and [url removed, login to view]") contains a file called "Explanation and [url removed, login to view]" which outlines the requirements and explains the context / objectives. It also contains sample Java code (NOT WORKING as expected yet) for further illustration, plus the input PDF file used in the code. The second pack ("Sample PDF [url removed, login to view]") contains a number of PDF files which can be used for testing the solution. Further test file can be made available at request.

Skills: Java, PDF, Software Architecture

See more: software testing pdf, programming in objective c pdf, programming in java pdf, objective c programming pdf, objective c pdf, java programming test, java programming pdf, code in java programming, java to objective c, architecture information, java programming software, automated software testing, ms access 2013, library information, java core, itext, information architecture, core java project, clown, bookmarked, sample project document java, programming java solution, xml csv using java, java create pdf, create xml file access table

About the Employer:
( 4 reviews ) Doha, Switzerland

Project ID: #4753506

Awarded to:


I have already available Java Desktop code using iText Lib to manipulate PDF [url removed, login to view] check PMB

$166 USD in 3 days
(6 Reviews)

13 freelancers are bidding on average $225 for this job


Hi, I specialize in creating custom-made tools that process PDF files in various ways, including tools created in Java (usually using PDFBox). I read the specs and I think it's feasible, but not for the price range you More

$500 USD in 7 days
(17 Reviews)

Hello, Please check private message.

$250 USD in 10 days
(35 Reviews)

Hello, we are experts in Java and can complete this task in a week. Kindly check PMB.

$155 USD in 3 days
(18 Reviews)

Hello, I'm a Java, JEE developer with over 10 years experience. I'm glad to work for you. Thanks.

$252 USD in 10 days
(4 Reviews)

Hi, I'm a Java/.NETdeveloper with about 10 years experience. Please check PM for details. Regards, Dmitry

$250 USD in 10 days
(3 Reviews)

More than 3 years experience in Java development.

$100 USD in 7 days
(2 Reviews)

i can work on this project and pls check pm.

$200 USD in 10 days
(3 Reviews)

i have 3 years experience in text mining.i will finish with in 3 days.

$244 USD in 3 days
(0 Reviews)

I can do that.

$250 USD in 10 days
(0 Reviews)

Hi, I'm ready to take up project immediately

$177 USD in 3 days
(0 Reviews)

Expert in Java and iText. Worked on projects reading and stamping both PDF forms and PDF graphic images.

$155 USD in 5 days
(0 Reviews)

I have 2 years of experience in developing Java based desktop and web apps. I have worked on Java swing, JavaFX, iText, JSP, Servlet, Strruts, Hibernate. If you think that I can make your job easy then please reply. Th More

$222 USD in 3 days
(0 Reviews)