I will provide working code which is currently used to extract English keywords from English texts.
The working code does the following:
1. Tokenize the text into sentences.
2. Perform sentiment analysis on each sentence and assign the sentence score to each word.
3. Tokenize the sentence into words.
4. Find POS tags and filter out unwanted words (like Personal Nouns).
5. Lemmatize words.
6. Use masterlist to map words. (More on the masterlist below).
7. Calculate score for each word. Current formula is: square root of word frequency times the maximal positive sentiment times (1-exp(-rank/200)), where rank for word frequency on the internet.
8. Dictionary of dictionaries is returned containing all the extracted information.
The dependencies are:
a) NLTK with corpora and Vader
d) All NLTK dependencies are checked for before running, and downloaded/installed if needed
I need you to tweak the above code so it works with Chinese, e.g. use StanfordSegmenter for tokenizing, etc.
The masterlist (6) is used to map keywords to a main keyword. It uses synonyms to do this. For example, if the word money is found, it is mapped to the word wealth. If the word cash is found, it is also mapped to wealth. I will provide a masterlist for Chinese. You just need to plug it into the existing code.
I will also provide the word ranking list (7).
So I think the main task is just using the Chinese language libraries rather than the English language libraries.
Please test your work before giving it to me.
Any questions, please ask.
Thanks for reading.
8 freelancers are bidding on average $307 for this job
I am a native Chinese, and fimiliar with Python, Pandas, Numpy. I am familiar with Chinese NLP tools. This is my first work, so I only want to get a good rate. Thanks!