Create a tool to process text data from 100 different Microsoft Word documents and import the data into a MySQL table.
It doesn't matter to me how this is written, but it must either run on a PC, or a php page. I'm not sure if this is best done with an Office macro or in .NET, or even php. You decide.
The MS Word docs describe testing procedures for technicians to follow in a chemical laboratory.
All the docs are standardized, meaning they all have the same section headers.
The data in each section header falls into 1 of 2 formats:
1) Free Text
This is just freely typed text in paragraph form
2) Ordered Numbers
In the Word Doc, each line listed in this section is either numbered, or bulleted with indentions.
It needs to assign each line a number, with additional decimals to indicate different levels of indentions. For example:
Word Doc data:
.. Line 4 (indented under 3)
.. Line 5 (indented under 4)
.. Line 6 (indented under 3)
4.1 Line 1
4.2 Line 2
4.3 Line 3
.. 4.3.1 Line 4 (indented under 3)
.... 184.108.40.206 Line 5 (indented under 4)
.. 4.3.2 Line 6 (indented under 3)
The new line numbers will be stored as a separate field, not with the text data.
The section number should be a separate field.
Some data is formatted as an equation, and may have special formatting (carriage returns, subscript, superscript), special characters (plus/minus, degrees, trademark, copyright). This formatting and characters must remain in tact in MySQL. The data will be read from MySQL in PHP and displayed in a jqGrid, HTML, and PDF files, so I will need your feedback about how to encode these special characters.