Extract Data from PDF Files to XML

In Progress Posted 7 years ago Paid on delivery
In Progress

Dear Freelancers!

Looking for massive help here !

Need help for building some kind of software or system which can capture certain information from PDF files or directly from a scanner.

We receive information on hard copy paper and some of the information listed on the paper should be extracted and stored in a database as text.

I have included to this project a couple of PDF examples of how the design of the PDF looks like.

What I am interested in extracting from the PDFs is:

- AWB (11 digits or 12 digits in format xxxxxxxxxxx or xxx-xxxxxxxx or xxx-xxxx xxxx)

- MRN (18 mixed digits and letters alwaying beginning with year and DK00 fx. 16DK00xxxxxxxxxxxx)

That is basically all data needed from the various PDF files.

Should be stored into software or browser based system - and if possible like this:

AWB MRN STATUS

23526491776 16DK0056002CEE7F10 OK

61522544077 16DK00560034234FD2 OK

11755688874 16DK005600JGFKFG7 OK

23565658794 16DK005600SJDGH45 OK

21746464646 16DK00560045345DSF ERROR

81045454570 16DK00560034254DFS OK

23554545788 16DK005600DSFLJHL3 OK

23526491776 16DK0056005354DSFD OK

Please note each AWB can have more than one MRN in each PDF site!

The goal is, when this project is finished, to be able to work further with the data from this designed software or browser based system. Plan is to be able to export the data again in a .xml file.

I have no idea if anyone are able to assist with designing this piece of software and I know that we should design it "on the fly" and it could require a lot of communication both ways to achieve the final result.

Please let me know if you are able to do this project for me and do not hesitate asking any questions you might have.

Thank you !

Martin Brandt.

MySQL PDF Software Development SQL

Project ID: #11416500

About the project

9 proposals Remote project Active 7 years ago

Awarded to:

itgold

Sir, I've worked specifically on batch optic character recognition, and can use very advanced libraries to recognize the text from the scanned pdfs. Many thanks. ITGold.

$13 USD / hour
(3 Reviews)
1.1

9 freelancers are bidding on average $14/hour for this job

shemer

Hey, This is pretty much the same project I did 2 projects ago for some guy on freelancer. He needed a system built where he could drop pdfs into a folder and then my script would kick start into action and OCR text fr More

$22 USD / hour
(1 Review)
3.8
rigelmg

i can do this if you provide more specific information,i worked in data scrapper software coding

$14 USD / hour
(0 Reviews)
0.0
faizjalil

Hello sir, Thanks for the opportunity and for taking the time to review our bid. I have read your job offer and am very interested in doing the jobs for you. My name is Faiz from Malaysia. You can get this job wil More

$16 USD / hour
(0 Reviews)
0.0
prashiddha

I have great experience of extracting data from any source like pdf, txt, csv and transform to any other source like XML, csv , excel. You can call be data service engineer, and so what I am here in Nepal. Thanks More

$12 USD / hour
(0 Reviews)
0.0
AmjadHanif

A proposal has not yet been provided

$10 USD / hour
(0 Reviews)
0.0