JAVA Software to extract specific data from a pdf file,

  • Status Closed
  • Budget €30 - €250 EUR
  • Total Bids 12

Project Description

A program written in java, that uses maeven for dependencies,

Hibernate implementation of JPA with hsqldb as the RDBMS.

Proper testing in Junit.

The program should be able to receive a PDf file as an input.

Extract the folowing data:


JPA Entity: Customer

Name(First line below "FACTUUR ADRES", String),

BillingAddress(2nd and 3th line below "FACTUUR ADRES", lines should be combined with a line break within the string, String),

Klant NR(Customer ID, long)



the BillingAddress and Kenteken can be empty. the rest should never be empty.


JPA Entity: Invoice

Customer(see above, customer entity)

Datum(Date format is in dutch text, should be parsed to DD-MM-YYYY)

Factuur NR(Invoice number, long)

List of Services(ArrayList of Service)

List of products(ArrayList of Product)

Subtotaal(Subtotal double)

BTW-tarief(VAT percentage as a double [url removed, login to view](0%) [url removed, login to view](100%)

the field below "BTW-tarief" should also be extracted



Customer can never be empty.

Date may never be empty

Factuur NR may never be empty.

The Rest of the fields can be empty sometimes.


JPA Entity: InvoiceEntry

Kenmerk(details, String)

Aantal(quantity, double)

Eenheid prijs(unit Price, double)

Korting(discount, percentage as a double [url removed, login to view](0%) [url removed, login to view](100%))



JPA Entity: Product extends InvoiceEntry

Onderdelen(Name, String)


JPA Entity: Service extends InvoiceEntry

Diensten (Name, String)



All entities should be saved to a hsqldb database


Version Control:

local git.


All data should be extracted correctly:

You should not create any values by calculation from the extracted data e.g. Totals

But you should test whether the extracted data is still correct by using the exsisiting data and applying the bussines logic to it.

Fore example Some interesting unit tests:

1.) Aantal * (Eenheid Prijs*((100-korting)/100)) = Bedrag(roundings of [url removed, login to view] are ok)

2.) Subtotal = total Sum of 'Bedrag' of 'Diensten ' List and 'Onderdelen' List(roundings of [url removed, login to view] are ok)

3.) "bedrag" can't be filled in without "Aantal" or "Eenheid prijs"

4.) Subtotal * BTW tarief percentage = the double value below BTW-tarief.

5.) Subtotaal + the value below BTW-tarief = TOTAL

6.) All InvoiceEntry fields can be empty, or all can be full.

doubles are comma seperated in the pdf.

Some test input files are provided in the attachement

The extraction should work on all of them.

Get free quotes for a project like this
Awarded to:
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online