Skip to Content
Learn
Word Embeddings
Review

Lost in a multidimensional vector space after this lesson? We hope not! We have covered a lot here, so let’s take some time to recap.

  • Vectors are containers of information, and they can have anywhere from 1-dimension to hundreds or thousands of dimensions
  • Word embeddings are vector representations of a word, where words with similar contexts are represented with vectors that are closer together
  • spaCy is a package that enables us to view and use pre-trained word embedding models
  • The distance between vectors can be calculated in many ways, and the best way for measuring the distance between higher dimensional vectors is cosine distance
  • Word2Vec is a shallow neural network model that can build word embeddings using either continuous bag-of-words or continuous skip-grams
  • Gensim is a package that allows us to create and train word embedding models using any corpus of text

Instructions

1.

Load a word embedding model from spaCy into a variable named nlp.

2.

Use the loaded model to create the following words embeddings:

  • a vector representation of the word “sponge” saved in a variable named sponge_vec
  • a vector representation of the word “starfish” in a variable named starfish_vec
  • a vector representation of the word “squid” in a variable named squid_vec
3.

Use SciPy to compute the cosine distance between:

  • sponge_vec and starfish_vec, storing the result in a variable dist_sponge_star
  • sponge_vec and squid_vec, storing the result in a variable dist_sponge_squid
  • starfish_vec and squid_vec, storing the result in a variable dist_star_squid

Print dist_sponge_star, dist_sponge_squid and dist_star_squid to the terminal.

Which word embeddings are furthest apart according to cosine distance?

Folder Icon

Sign up to start coding

Already have an account?