pdf document structure extraction
- Status Closed
- Budget $30 - $100 USD
From a pdf file, build an xml/html file extracting all text sequentially AND creating tags around title, heading levels, paragraphs, footer/header, side notes, text boxes and ideally tables.
Documents to parse are mostly offering documents from banks, and will mostly contain text, sometimes tables and be mostly in portrait.
Scientific papers you find on the internet are an easy to find and possibly simpler first set of documents for your testing.
Development should be done in Perl or java and running on windowsGet free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online