This is you first assignment for familiarising yourself with PyTorch.
You have to complete the missing parts in the code, with the goal of training
a baseline RNN model for sentiment classification in Twitter messages.
The functions for loading the raw data (utils/load_data.py
)
and the pretrained word embeddings (utils/load_embeddings.py
)
are given to you.
The key points of the first assignment are:
- Utilize the dataloading abstractions of PyTorch, namely torch.utils.data.Dataset and torch.utils.data.DataLoader. Don't use torchtext.
- Initialize the embedding layer of your model with pretrained word embeddings. I recommend using Glove's 50 dimensional vectors , as the performance of the model is irrelevant and using low-dimensional embeddings will speed things up.
- Implement a baseline RNN model. Than means using the RNNs output from the last timestep as feature representation of the input (no attention!). Remember, you have to account for the zero-padded timesteps!
The training pipeline (root) is in train.py
.
The classes for the model definition and dataloading are defined here:
modules/dataloaders.py
modules/models.py
but you have to implement the necessary methods.