In Progress

Data extraction from 3 different sources: PDF, HTML, and Word Files

I need an interface developed that would allow a user to upload a file in PDF, Doc Or Docx, and HTML format. Once uploaded, the PHP page will extract information from each of the three types of files and store the information in a CSV file that is comma separated. Even though the three files are in different format, the same type of information will be extracted from each. This will create a consistent out put in the CSV file.

I attached a copy of the HTML, Word, and PDF Files to the project for viewing. The information that needs to be extracted from each of the document is the following:

● Class name: e.g. Sr. Puppy (9-12 Months) – Male

● Armband number – 2 to 4 digits

● Dog name

● Registration number: alphanumeric or “Listed”

● Date of Birth

● Class Placement: may be blank

● Breeder name: can be more than one name

● Sire and Dam: (parents) format is name of sire X name of dam

● Place of birth: Canada or Elsewhere

● Owner name: can be more than 1 person

● Agent name: optional

Using the word document format as an example, the following would be an example of what is to be extracted:

(Section from Word Document)

Sr. Puppy (9-12 Months) - Male

102 GRASSRIDGE I AM A ROCK, AE499458, 04-Mar-2013

1ST Breeders: Denise Cranna. Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace. Canada. Owner: Karen IBBITSON, Denise CRANNA. Agent: Ingrid WINKLER

Information to be extracted:

Class Name: Sr. Puppy (9-12 Months) – Male

Armband number: 102

Dog name: Grassridge I Am A Rock

Registration No: AE499458

DOB: 04-March-2013

Class Placement: 1st

Breeder Name: Denise Cranna

Sire & Dam: Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace

Place of Birth: Canada

Owner: Karen Ibbitson, Denise Cranna

Agent: Ingrid Winkler

Skills: Data Mining, Data Processing, PHP

See more: sources format, html dog, flight agent, word pdf create, sire, pdf page word, pdf docx, karen g, grace, denise, dam, copy pdf doc, word extract, php extract pdf file, pdf extract information, extract data pdf csv, page extraction, sources information, extract pdf word, doc docx pdf, pdf data extract, csv files processing, word csv, csv pdf, date extraction

About the Employer:
( 22 reviews ) Sydney, Canada

Project ID: #5386847

Awarded to:


Hello, We have gone through the scope of work and would be happy to provide complete solution on this application that will extract data from uploaded file and future support also if required. Let's take it to th More

$149 USD in 3 days
(41 Reviews)

9 freelancers are bidding on average $311 for this job


Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of our experience: Amazon/Ebay Bots: [url removed, login to view] More

$144 USD in 3 days
(267 Reviews)

Hello, With 99% completion rate, 650+ successfully completed projects, and a 5.00 reputation (maximum possible, 5.0) (Yes, not even 4.99 average rating, can be verified on my profile page !!)... you can never go wro More

$1030 USD in 4 days
(499 Reviews)

Hi there Your project posting really excites me as I am doing similar job. I am working as a Solution Specialist (LDR - Business Development) in a leading US based company where I do lots internet research (target More

$444 USD in 13 days
(13 Reviews)

Hi there, i have major Experience in php and scraper. i will create for your to extract the data from 3 of those format. but there are some limitation to extract. the giving files will be format as you describe. Tha More

$211 USD in 7 days
(53 Reviews)

Hello. I am a web developer with 11+-year experience . I work remotely on DigitalRay company - [url removed, login to view] - LA (USA). Technical knowledges: 1PHP JavaScript Ruby RoR [url removed, login to view] C#, Zend, CakePHP, Symphony frameworks More

$255 USD in 3 days
(23 Reviews)

Hello. What data should be extracted - all clear, excellent description. Some other questions (sorry, maybe it looks stupid, but I want to fully understand project, before we will start): 1. in some places like he More

$200 USD in 5 days
(30 Reviews)

Hello sir,  I have seen your requirement of the project and can do the mentioned requirement with good efforts and 100% completion with good quality and on time completion. Website Implementation Done :  ------- More

$215 USD in 14 days
(6 Reviews)

A proposal has not yet been provided

$155 USD in 3 days
(0 Reviews)