Repost - This needs to be completed in 2 days MAX.
Twitter is investigating a new method of presenting search results to users. A new method was proposed, but unfortunately, the employee who proposed the method has left the company and refuses to release any more information on the method. Your task is reverse engineer the method by performing the steps for a chosen query and working out what the ordering is giving. The new method for presenting search results, along with questions from Twitter, are as follows:
Choose and submit a search phrase for approval. The search results must provide at least 100 tweets. How does Twitter order the search results (e.g. does the ordering depends on the query term appearances, the date of the tweet, ...?).
Using the Twitter API, create a weighted term index W of the search result tweets and report the top 10 IDF weighted terms (not including the search terms). If there are any frequently occuring terms that are not words (e.g. parts of a URL), place them in the stop list and recompute the term weights. Are all of the top 10 terms related?
Using the term index and an appropriate metric, compute a dissimilarity matrix D, containing the dissimilarity between all terms and use MDS to visualise the dissimilarity between terms. Do there appear to be clusters of terms?
Use the same metric as above to compute the number of clusters using the elbow method. Then perform the clustering and report the top 10 terms of each cluster. Examine each of the top 10 words for each cluster and manually determine the theme/topic of each cluster of terms.
Create a term topic matrix T and multiply it with W to obtain the tweet topic matrix Z. Create the graph adjacency matrix A=ZZ', where there is an edge between two tweets if they share a topic, and plot the graph of tweets. If the graph is not ergodic, then increase the number of terms in each topic to the top 20 words and recompute the graph. If only a few tweets are not connected, then these can be ignored.
Finally, compute the closeness centrality of each tweet in the graph and compare this ordering to the original tweet ordering (from the search results). Find a way to measure the difference between the two orderings and report the result. The set of tweets should have a difference ordering to the original search order. What are they now ordered by?
Twitter want the analysis of the new "search results ordering method" to be written up in a professional report. Each part of the method should have its own section of the report and all questions from Twitter should have thoughtful answers. Any code that is used should be included and clearly explained (include comments in the code).
Once the required analysis is performed by the group, the members of the group are to write up the analysis as a report. Remember that the assessor will only see the groups report and will be marking the group's analysis based on your report. Therefore the report should contain a clear and concise description of the procedures carried out, the analysis of results and any conclusions reached from the analysis.
The required analysis in this specification covers the material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report.
All classwork/lectures can be provided.
** The analysis is to be completed using R programming language, using RStudio **
I have 4+ years of working experience in machine learning domain and have master degree in computer science. Worked on many projects mainly in Predictive analytics, Natural language language, text mining, web mining et