In Progress

Parsing semi-structured Word documents into MySQL

This should be an easy one.

I have a series of MS Word documents that are somewhat structured and I need a parser written to capture each section of the document and insert it into a MySQL table. The documents contain information on different cities (schools, entertainment, how to get around, etc.) so each document details a different city/area.

The structure of the documents is as follows:

[[title]]

Getting Around (MS Word H1 style)

text for "Getting Around"... (MS Word Normal style)

[[subhed]]

By Car (MS Word H1 style)

text for "By Car"... (MS Word Normal style)

[[body]]

[[menu]]

Freeways (MS Word H1 style)

text for "Freeways "... (MS Word Normal style)

Drive Time & Distance (MS Word H1 style)

text for "rive Time & Distance"... (MS Word Normal style)

Drivers License (MS Word H1 style)

text for "Drivers License"... (MS Word Normal style)

...and so on so we have a structure like:

Chattanooga

--Education

----Elementary Schools

--------Here are some of the schools in the area...

----High Schools

--------Here's a list of high schools

--Getting Around

-----By Car

--------Driver's License

-----------Driver's License info...

--------Public Transportation

-----------Public Transportation info...

etc.

The MySQL schema I have set up is a simple, single table that captures the hierarchy of these parsed sections:

Field Type Null Key Default Extra

id int(11) NO PRI NULL auto_increment

area int(11) NO

type int(11) NO

parent int(11) NO 0

content_html varchar(1000) NO

status int(11) NO

create_date datetime NO

edit_date datetime NO

created_by int(11) NO

edited_by int(11) NO

So, basically, a document would be parsed on it's title (with a lookup to an "area" table to grab the id, and another lookup for "type" for FK references in the table above), and then each section parsed ([[title]], etc.) would be a new row in this table. Nested sections would have the id of their parent section in the "parent" column and root-level sections would have a "parent" value of 0.

I can provide sample documents and a full schema (including one manually parsed document) upon an accepted bid.

Sounds easy enough, right? You pick the language as long as it's Perl, PHP, Java, or VB (that's what the maintenance programmer is familiar with).

Skills: Data Processing, Java, Perl, PHP, Visual Basic

See more: drivers license parsing vbnet, word mysql, parsing text word doc, value city, root info, on semi, ms lookup, level status, java distance, int i, get cgi, education city, drive time, car programmer, car id, parsing word php, parsing word docs java, parsing word doc javascript, structured word file parsing

About the Employer:
( 0 reviews ) Mount Airy, United States

Project ID: #224272

Awarded to:

setsailgo

pls see pmb.

$100 USD in 2 days
(1 Review)
2.0

17 freelancers are bidding on average $173 for this job

creatorul

Quality work

$250 USD in 10 days
(133 Reviews)
7.3
momleetech

we can do it

$250 USD in 7 days
(109 Reviews)
6.7
kernelbd

We can do this, we have experience to do this type of work. Thanks.

$199 USD in 3 days
(84 Reviews)
6.3
shivaminfonet

We have an extensive experience working in PH P. We will complete the job with the best quality development work. Let me know when to start. Regards, Shivam Infonet Pvt. Ltd.

$250 USD in 5 days
(10 Reviews)
5.6
kamosion

It's easy. Please see my profile.

$130 USD in 3 days
(11 Reviews)
5.1
sakshi2004

Hi Sir, I have understood you requirements of storing data present in MS word document to MySQL DB. I have sent you a PM with detailed understandings for this task. I can give you timely delivery with excellent More

$100 USD in 5 days
(14 Reviews)
5.0
theisoft

hello, please see pm. thanks

$110 USD in 2 days
(7 Reviews)
4.9
VLADNOVA

Sir, consider it's done :-)

$100 USD in 7 days
(10 Reviews)
4.4
kintudesigns

Dear Sir, In last 16 months we have developed 60 Dynamic Php projects and 25-45 .NET Projects for Clients in US, UK, Italy, Netherlands, Denmark, Norway.Which Includes sites like Real Estate, Dating, Content Manage More

$244 USD in 28 days
(1 Review)
3.0
vnb400

kindly refer PMB, Thanks

$150 USD in 4 days
(6 Reviews)
3.0
ecubes

Hello, we have extensive experience in graphic designing/web development XHTML, FLASH, CSS, ASP, PHP, ASP.net, AJAX, JAVASCRIPT, VB.net, OsCommerce, Script Installation, Joomla, ACCESS, MySQL, MS SQL, complete CMS driv More

$250 USD in 5 days
(3 Reviews)
2.0
zeus78

Hello, sir! I'm an experienced programer in VB and promise hi accuracy!

$100 USD in 5 days
(0 Reviews)
0.0
lakshmimys

I have worked on same sort of project which includes handling plain text in the similar format you have mentioned. So I can do this.

$200 USD in 7 days
(0 Reviews)
0.0
balaapk

i can do this

$210 USD in 7 days
(0 Reviews)
0.0
diava

Hi , Please refer to PMB for details. Regards , Dia

$240 USD in 1 day
(0 Reviews)
0.0
rnz1

Please see pm. Regards, rnz1.

$150 USD in 5 days
(0 Reviews)
0.0
segeyplus

Hello, I have done almost the same work on Word files with poorly determined structure, converting them to Access for another person. The task is soluble but of cause could have some unexpected difficulties. Please giv More

$150 USD in 2 days
(0 Reviews)
0.0