Create a software that scans an e-book file and creates a database file which contains hard words contained in that e-book. So the software you make should recognize the hard words from any text file [like .pdf, .doc, .docx, .djvu, .epub, .lit, .pdb, .azw]. It should use two databases to do this job one containing a list of "hard words" and other "all words" in the english language.
So the software finds the words in the e-book file which are hard; where the hard words are those which are present in your "hard words" database and creates an output database file (likely to be .sql or .csv) of list of hard words.
Secondly the software should also be able to identify words which are not hard and also not present in the "all words" database of your software which we define to be new to the software. So the software should also create a second output file which are words which are new to the software as they are not present in the "all words" database.
Important features of the software:
Create your own database of hard words with headers 'word' and 'meaning.' You may create such database file from word-lists available over the internet like GRE top 8000 words or similar.
It should be able to create the two output database files in different sorting methods like alphabetical and the order in which they appear in the e-book file.
Output database will also contain two headers "name" and "meaning"
Databases used should be large enough to recognize maximum words though it should avoid redundancy.
Meanings can be synonyms but should be reliable and easy
Softwares should have the feature of adding new words to the database or replacing it with a new database file.
Software need not to be good at interface. More important is its functioning according to the parameters described.