On my Windows 7 desktop, I have a directory that contains multiple Word documents, formatted similarly to the one attached. These documents are educational courses, which are subdivided into modules. The goal of this project is to create a Windows 7 application to allow a user to search the directory of these course documents for a specific term, and return a list of courses and modules which contain that term. The application should run on both 32- and 64-bit systems.
SEARCH FUNCTIONALITY SPECIFICS
The application will categorize the text into courses and modules, which are determined as follows:
- Each document is one course, which contains multiple modules.
- The course title is the first paragraph of each document, formatted in the style Intro Heading 1.
- The beginning of a module is indicated by a paragraph in the style Heading 1, which is the module title.
- The exception to this are Heading 1 paragraphs that say only “Introduction”. These are chapter introductions, not modules, and they should not be indexed.
The end of a module is indicated by any of the following:
- The start of another module (i.e. another Heading 1)
- The start of a new chapter, indicated by a paragraph in the style ChapterTitle
- The end of the course, indicated by a paragraph in the style Heading FBMatter that says “Conclusion”
So, in the attached sample document, the course title is “Understanding Asthma,” and the first three modules are “Asthma Profile”, “Respiratory System”, and “Asthma Attacks”.
The user interface will allow users to search for a term, and return a list of courses and modules that contain that term. The results should appear as a table of course and module titles, along with the number of results for the term (not case-sensitive) in each module.
The results will also need to count alternate forms of the search term (plurals, abbreviations, etc.) as instances of the term. For example, if the search term is "cardiovasulcar disease," the user may wish for "CV disease" to be counted in the results as well. However, instead of programmatically trying to determine those, it will be easier to have the users manually enter alternate forms of the term if they like. This can be seen in the attached sample UI picture. Whenever a search is executed, these alternate forms should be saved in the application, so if that same term is entered into the “Search term” box in the future, the “Alternate forms” list should populate with the alternate forms already entered.
Search results should display in a table of course titles, with module titles subordinate. The second column should display the number of results for the term (and alternate forms) in that module. For example, if the indexed directory contained the attached sample course, and I searched for the term “airway” with alternate form “airways”, I would want to see something like the picture attached as SampleResults1.jpg, listing all modules in the course that contain the term. (The numbers in the example are not accurate, this is just an example of the results layout.)
Or, if the directory also contained a course called “Respiratory Diseases,” the results might look like this SampleResults2.jpg, listing totals for all the courses and modules that contain the term.
The results tables must be able to be either saved as Word and/or Excel files, or else able to be copied and pasted into one of those programs.
If you have any questions, please do not hesitate to ask. Thanks for bidding!
Additional Project Description:
12/29/2012 at 15:32 IST
Update: actually attached the sample files. Sorry about that.