In Progress

Parsing semi-structured Word documents into MySQL

This should be an easy one.

I have a series of MS Word documents that are somewhat structured and I need a parser written to capture each section of the document and insert it into a MySQL table. The documents contain information on different cities (schools, entertainment, how to get around, etc.) so each document details a different city/area.

The structure of the documents is as follows:


Getting Around (MS Word H1 style)

text for "Getting Around"... (MS Word Normal style)


By Car (MS Word H1 style)

text for "By Car"... (MS Word Normal style)



Freeways (MS Word H1 style)

text for "Freeways "... (MS Word Normal style)

Drive Time & Distance (MS Word H1 style)

text for "rive Time & Distance"... (MS Word Normal style)

Drivers License (MS Word H1 style)

text for "Drivers License"... (MS Word Normal style)

...and so on so we have a structure like:



----Elementary Schools

--------Here are some of the schools in the area...

----High Schools

--------Here's a list of high schools

--Getting Around

-----By Car

--------Driver's License

-----------Driver's License info...

--------Public Transportation

-----------Public Transportation info...


The MySQL schema I have set up is a simple, single table that captures the hierarchy of these parsed sections:

Field Type Null Key Default Extra

id int(11) NO PRI NULL auto_increment

area int(11) NO

type int(11) NO

parent int(11) NO 0

content_html varchar(1000) NO

status int(11) NO

create_date datetime NO

edit_date datetime NO

created_by int(11) NO

edited_by int(11) NO

So, basically, a document would be parsed on it's title (with a lookup to an "area" table to grab the id, and another lookup for "type" for FK references in the table above), and then each section parsed ([[title]], etc.) would be a new row in this table. Nested sections would have the id of their parent section in the "parent" column and root-level sections would have a "parent" value of 0.

I can provide sample documents and a full schema (including one manually parsed document) upon an accepted bid.

Sounds easy enough, right? You pick the language as long as it's Perl, PHP, Java, or VB (that's what the maintenance programmer is familiar with).

Skills: Data Processing, Java, Perl, PHP, Visual Basic

See more: parsing word documents, parse word document php, drivers license parsing vbnet, php word doc parser, parsing word document php, word mysql, parsing text word doc, php parsing word document, structured word document, php mysql word document, value city, root info, on semi, ms lookup, level status, java distance, int i, how to get cgi, education city, drive time, car programmer, car id, 1000 word document, parsing word php, semi structured word documents

About the Employer:
( 0 reviews ) Mount Airy, United States

Project ID: #224272

Awarded to:


pls see pmb.

$100 USD in 2 days
(1 Review)

17 freelancers are bidding on average $173 for this job


Quality work

$250 USD in 10 days
(133 Reviews)

we can do it

$250 USD in 7 days
(109 Reviews)

We can do this, we have experience to do this type of work. Thanks.

$199 USD in 3 days
(84 Reviews)

We have an extensive experience working in PH P. We will complete the job with the best quality development work. Let me know when to start. Regards, Shivam Infonet Pvt. Ltd.

$250 USD in 5 days
(10 Reviews)

It's easy. Please see my profile.

$130 USD in 3 days
(11 Reviews)

Hi Sir, I have understood you requirements of storing data present in MS word document to MySQL DB. I have sent you a PM with detailed understandings for this task. I can give you timely delivery with excellent More

$100 USD in 5 days
(14 Reviews)

hello, please see pm. thanks

$110 USD in 2 days
(7 Reviews)

Sir, consider it's done :-)

$100 USD in 7 days
(10 Reviews)

Dear Sir, In last 16 months we have developed 60 Dynamic Php projects and 25-45 .NET Projects for Clients in US, UK, Italy, Netherlands, Denmark, [login to view URL] Includes sites like Real Estate, Dating, Content Manage More

$244 USD in 28 days
(1 Review)

kindly refer PMB, Thanks

$150 USD in 4 days
(6 Reviews)

Hello, we have extensive experience in graphic designing/web development XHTML, FLASH, CSS, ASP, PHP,, AJAX, JAVASCRIPT,, OsCommerce, Script Installation, Joomla, ACCESS, MySQL, MS SQL, complete CMS driv More

$250 USD in 5 days
(3 Reviews)

Hello, sir! I'm an experienced programer in VB and promise hi accuracy!

$100 USD in 5 days
(0 Reviews)

I have worked on same sort of project which includes handling plain text in the similar format you have mentioned. So I can do this.

$200 USD in 7 days
(0 Reviews)

i can do this

$210 USD in 7 days
(0 Reviews)

Hi , Please refer to PMB for details. Regards , Dia

$240 USD in 1 day
(0 Reviews)

Please see pm. Regards, rnz1.

$150 USD in 5 days
(0 Reviews)

Hello, I have done almost the same work on Word files with poorly determined structure, converting them to Access for another person. The task is soluble but of cause could have some unexpected difficulties. Please giv More

$150 USD in 2 days
(0 Reviews)