Learn
Word Embeddings
Review

Lost in a multidimensional vector space after this lesson? We hope not! We have covered a lot here, so let’s take some time to recap.

• Vectors are containers of information, and they can have anywhere from 1-dimension to hundreds or thousands of dimensions
• Word embeddings are vector representations of a word, where words with similar contexts are represented with vectors that are closer together
• spaCy is a package that enables us to view and use pre-trained word embedding models
• The distance between vectors can be calculated in many ways, and the best way for measuring the distance between higher dimensional vectors is cosine distance
• Word2Vec is a shallow neural network model that can build word embeddings using either continuous bag-of-words or continuous skip-grams
• Gensim is a package that allows us to create and train word embedding models using any corpus of text

### Instructions

1.

Load a word embedding model from spaCy into a variable named nlp.

2.

Use the loaded model to create the following words embeddings:

• a vector representation of the word “sponge” saved in a variable named sponge_vec
• a vector representation of the word “starfish” in a variable named starfish_vec
• a vector representation of the word “squid” in a variable named squid_vec
3.

Use SciPy to compute the cosine distance between:

• sponge_vec and starfish_vec, storing the result in a variable dist_sponge_star
• sponge_vec and squid_vec, storing the result in a variable dist_sponge_squid
• starfish_vec and squid_vec, storing the result in a variable dist_star_squid

Print dist_sponge_star, dist_sponge_squid and dist_star_squid to the terminal.

Which word embeddings are furthest apart according to cosine distance?