This repo is a minimalist implementation of a Mini-BERT for Text Classification. As pretrained models are getting bigger and bigger, it is important to allow young researchers without many resources to try and experiment new architectures.
This repo is based on Minimalist Implementation of a BERT Sentence Classifier which is founded on the following libraries:
The datasets were originally downloaded from Assignment 1 of the course Neural Networks for NLP.
This project uses Python 3
Create a virtual env with (outside the project folder):
virtualenv -p python3 minibert-env
source minibert-env/bin/activate
Install the requirements (inside the project folder):
pip install -r requirements.txt
python training.py
Available commands:
Training arguments:
optional arguments:
--seed Training seed.
--distributed_backend Supports three options: dp
--use_16bit If true uses 16 bit precision
--batch_size Batch size to be used.
--accumulate_grad_batches Accumulated gradients runs K small batches of \
size N before doing a backwards pass.
--log_gpu_memory Uses the output of nvidia-smi to log GPU usage. \
Might slow performance.
--val_percent_check If you dont want to use the entire dev set, set \
how much of the dev set you want to use with this flag.
Early Stopping/Checkpoint arguments:
optional arguments:
--metric_mode If we want to min/max the monitored quantity.
--min_epochs Limits training to a minimum number of epochs
--max_epochs Limits training to a max number number of epochs
--save_top_k The best k models according to the quantity \
monitored will be saved.
Model arguments:
optional arguments:
--encoder_learning_rate Encoder specific learning rate.
--learning_rate Classification head learning rate.
--class_weights Weights for each of the classes we want to tag.
--warmup_steps Scheduler warmup steps.
--dropout Dropout to be applied to the BERT embeddings.
--train_csv Path to the file containing the train data.
--dev_csv Path to the file containing the dev data.
--test_csv Path to the file containing the test data.
--loader_workers How many subprocesses to use for data loading.
--label_set Set of labels we want to use in our classification task (e.g: 'pos,neg')
Training command example:
python training.py \
--gpus 1 \
--distributed_backend dp \
--batch_size 6 \
--accumulate_grad_batches 2 \
--loader_workers 4 \
--nr_frozen_epochs 1
Testing the model on shell:
python interact.py --experiment experiments/lightning_logs/version_{date}
Launch tensorboard with:
tensorboard --logdir="experiments/lightning_logs/"
To make sure all the code follows the same style we use Black.