GloVe that relies on a different algorithmic principle but also yields vector representations of words.
Within the text file inside the archive, each line contains a word followed by a space and then a series of floating point numbers (also space-separated). The floating point numbers for a word (300 in total) constitute the word vector representation in a 300-dimensional word vector space.
Please write code to achieve the following tasks and report the results. Do
not use any libraries for the nearest neighbor computation, but instead write
your own code for this. You may use any programming language (it is easy
to store and manipulate a 300-dimensional array of floating point numbers
in almost any programming language).
--> Task 1
Determine the 5 nearest neighbours of your first name in terms of the cosine
similarity measure, along with the respective cosine similarity scores. For
each neighbour, list the word/name, not the vector.
Note that you may need to lower-case your name to find it (e.g. “nicole”
instead of “Nicole”). If (and only if) your first name is genuinely not covered
by the word vector data, then report this fact and use the first name of a
--> Task 2
Write code to create a vector representation for an entire sentence simply
by taking the average of all word vectors for words in that sentence. This
involves 1) tokenizing a sentence, i.e., splitting it into words, for which you
may use a very na¨ıve and imperfect method. Then 2) look up the word
vectors for those tokens. Make sure to apply lower-casing if necessary. You
may ignore tokens that are not covered by the vocabulary of the word vectors.
Finally, 3) take the average, i.e. compute the component-wise sum of the
word vectors, and then divide each component by the number of words in
the sentence that were covered by the data.
Next, choose a random sentence S0 and compute the vector representation
of that sentence using the above method. List the nearest neighbour words to
that sentence vector (i.e., determine which words in the data have a similar
vector representation to the vector for the sentence).
--> Task 3
Choose two other sentences S1 and S2 such that S1 is similar in meaning
to S0, and S2 is dissimilar in meaning to S0. Create the sentence vectors
using the method from Task 2, and report the cosine similarities between the
vectors for S0 and S1, and between the vectors for S0 and S2.
Explain whether the obtained cosine similarity scores are reasonable and
give a brief explanation of why or why not.
8 freelancers are bidding on average $168 for this job
Hi, I am good at machine learning and data science and I am proficient in Python. I have a research degree in m/c learning from IIT Madras. Please consider and have a good day. Regards