This assignment is worth 30% and it has two components: 15% for the implementation of the solution of a given problem and 15% for the final report on the developed code. The programming assignment is structured in a number of incremental steps. The report will have to summarise concisely the study of the problem, the design of the chosen solution, its implementation and, whenever possible, an indication of how the code, used to solve the problem, could be optimised. An essential part of the report is the analysis of the complexity of the implemented algorithms. A thorough analysis should give an indication on how the code can be optimised.
You are asked to implement a search engine in a programming language (JAVA). The search engine, guided by user queries, must analyse a set of given documents. A simple user interface will have to be implemented to allow the user to pose queries and return the obtained results displayed in an intelligible and organised way (for instance, extracts of what the engine found and sorted by importance). An algorithm will have to be implemented to search for query words, in isolation and/or in combination, assign a score to each match and sort the matches by score. The returned results will have to provide pointers to the original document(s) and position within the document(s).
Examples of queries:
Search computer Simple query
Search news cinema Composed query with no proximity constraint
Search blue and yellow Composed query with AND constraint
Search apple or banana Composed query with OR constraint
Search “search engine” Composed query with proximity constraint
A number of points are attached to each step for a total of 100 points, which will be scaled to the 30% for the whole assignment.
Step 1: the design of the solution [10 points]
Study the problem and provide the lecturer with flow diagrams indicating the classes you designed, their purpose, their functionality and their interactions. In essence the method you have employed to store data (data structures, classes and the like) and the so called “behaviour” of your code, in terms of interaction/message passing between objects of implemented classes. The diagrams indicating the flow of execution are the most important.
Step 2: implementation of user interface [10 points] &#61663; CHECK
You will have to implement a user interface; it does not have to use graphics, however, it must be functional, allowing the user to input simple and composed queries. The query will implement Boolean logic and be able to interpret complex Boolean requests. The user interface will also have to present in a structured manner the results to the user. Ideally, as in all search engines, the user might want to refine the search, and the output results should reflect a structure, which can be further analysed (see how Google works).
Step 3: implementation of matching algorithm [40 points]
The matching algorithm will be the core of your implementation. It will have to make use of the interpreted query to search for simple and composed requests. The matching algorithm will have to work with ASCII documents to find the words indicated by the user in their queries. Searches will have to be given a score, depending on whether the Boolean request is satisfied, the proximity of words and the frequency of occurrence. For instance, a query such as search yellow and red, will trigger a search for yellow and red and if found in the same sentence score higher than if they found in different sentences. Proximity constraints may also be enforced by the user, so for instance, if a query were search “red football” that would mean that both red and football words must be found adjacent in a sentence. If both words were found apart, then their distance would be used to score the match.
Step 4: implementation of sorting algorithm [10 points] &#61663; CHECK
Each matching process will return a number of matches. Each one of them will have a score. This step asks you to make use of the score to sort the matches and prepare them for the user in a meaningful way. As mentioned earlier on, the user may want to modify the original query and the sorting algorithm needs to be designed with this option in mind.
Step 5: optimisation of code [30 points]
I recommend you implement a quick and dirty solution at first. Your code will be VERY messy at first. I would then ask you to analyse the speed of your code, employing the techniques learnt during lectures. You will have to estimate the complexity of your algorithm and study if and how it can be optimised. If it can, then you should optimise it.
The report: (15%)
The report must be no longer than 10 A4 pages and no shorter than 5 A4 pages. It must be comprehensive, including the study and analysis of the problem, the design of the algorithm, the description of the implemented code, the analysis of the time complexity and the way code is or can be optimised. The actual code must be included in an appendix. The appendix IS NOT part of the 5 to 10 A4 pages mentioned at the beginning of this paragraph.
The demonstration: (15%)
The demonstration will be assessed for each student individually. You will have to describe your code in detail, answer detailed programming questions and you will have to demonstrate the implemented functionality of your code. ASCII documents will be made available one week before the demonstration date.