Lab Assignment from AI for Beginners Curriculum.
In this lab, you we challenge you to train Word2Vec model using Skip-Gram technique. Train a network with embedding to predict neighboring words in
You are welcome to use any book. You can find a lot of free texts at Project Gutenberg, for example, here is a direct link to Alice's Adventures in Wonderland) by Lewis Carroll. Or, you can use Shakespeare's plays, which you can get using the following code:
path_to_file = tf.keras.utils.get_file(
'shakespeare.txt',
'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
If you have time and want to get deeper into the subject, try to explore several things:
- How does embedding size affects the results?
- How does different text styles affect the result?
- Take several very different types of words and their synonyms, obtain their vector representations, apply PCA to reduce dimensions to 2, and plot them in 2D space. Do you see any patterns?