Information extraction from bookmarked PDF using Java

IN PROGRESS
Bids
13
Avg Bid (USD)
$225
Project Budget (USD)
$30 - $250

Project Description:
For this small development project the following skills are absolutely essential:

- Java programming skills
- Good understanding of PDF file structure
- Experience with manipulating PDF files using Java
- Experience with using an opensouce PDF Java library such as PDF Clown, iText, PDFTextStream or others

The objective is to create a core component of an automated solution which takes bookmarked PDF files and extracts the numbered, itemized paragraphs together with the text outline as classifier into a machine readable format (e.g. CSV, XML, MS Access table).

The solution needs to work only on a specific set of PDF files which all use the same document structure and all of which are bookmarked. Two sets of files are attached. The first pack ("Requirements and Sample.rar") contains a file called "Explanation and requirements.pdf" which outlines the requirements and explains the context / objectives. It also contains sample Java code (NOT WORKING as expected yet) for further illustration, plus the input PDF file used in the code. The second pack ("Sample PDF documents.rar") contains a number of PDF files which can be used for testing the solution. Further test file can be made available at request.

Skills required:
Java, PDF, Software Architecture
Additional Files: Requirements and Sample Pack.rar Sample PDF documents.rar
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 500
in 7 days
Hire bdlions
$ 250
in 10 days
Hire swteam4
$ 155
in 3 days
$ 252
in 10 days
Hire dlisin
$ 250
in 10 days
Hire tikumishra
$ 166
in 3 days
$ 100
in 7 days
Hire vinchd
$ 200
in 10 days
$ 222
in 3 days
Hire keyankarthik
$ 244
in 3 days