This project is about B-Tree in Advanced Data Structures. It needs to be done in C++.
With different order values (m), searching for data in B-trees can take different execution times. In this project, for B-Trees, experimenting effects of different order values on searching is the main issue. Tests will be performed on Shakespeare's Sonnets dataset which is provided with project.
Building the Tree:
From command prompt, your b-tree builder implementation must take the order value (m). With respect to user-defined m, b-tree is built using Shakespeare's Sonnets dataset. This dataset contains one sentence in each row. You must built a B-tree with using words as keys (use words as keys – use strcmp() function for lexical order). Node n which is represented with word w (key of n is w), should contain the list with two integers x and y (x is the order of word in the sentence and y is the order of sentence in the text ). Then program stores b-tree in a file for later use (main experiment part).
For example, suppose that word ‘advanced’ is placed in the 2th word of the 4nd sentence and 5th word of the 6th sentence. Then the node for key ‘Advanced’ contains following couples: (2,4), (5,6).
For the first step, define a rule of heuristic for determining sentences (sentences must not be determined with 100% accuracy – it is ok). Split your sentences into words from spaces and then perform following operations on the words
- Make all characters lowercase
- Eliminate all characters except for letters
Then build the B-tree considering the words finally obtained.
Searching for the words:
With using the stored B-tree with your B-Tree builder implementation, your searcher implementation searches n words which are given from command prompt. There is a node with given word in B-tree, program outputs the data which is contained by that node. Otherwise program outputs an error message for that search. Finally, program outputs the total execution time.