pdf document structure extraction

This project is now closed with a project budget of $30 - $100 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
$30 - $100 USD
Project Description

From a pdf file, build an xml/html file extracting all text sequentially AND creating tags around title, heading levels, paragraphs, footer/header, side notes, text boxes and ideally tables.

Documents to parse are mostly offering documents from banks, and will mostly contain text, sometimes tables and be mostly in portrait.

Scientific papers you find on the internet are an easy to find and possibly simpler first set of documents for your testing.

Development should be done in Perl or java and running on windows

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online