The information of the project is given below :
You will be provided with the files [url removed, login to view], [url removed, login to view], [url removed, login to view] and Porter.pm. The file [url removed, login to view](can't be uploaded since greater than 1mb) contains a collection of documents which record publications in the CACM (Communications of the Association for Computing Machinery). Inspect the file and you will see that the text of each document comes enclosed within (XML-style) open and close document tags, where the open tag also specifies a numeric identifier for the document. Each document is a short record of a CACM publication, including its title, author(s), and abstract — although one or other of these (especially abstract) may be absent for a given document. You are required to write two separate programs (Perl scripts): (i) one program that computes an inverted index for the document collection, and (ii) a second program which loads this inverted index and uses it to do retrieval.
The assignment is uploaded.