Can you identify sarcastic sentences? Can you distinguish between fake news and legitimate news? In this repository, you can find an implementation of several RNNs for detecting sarcasm in news headlines and also using pre-trained BERT for this task. There is one important notice before looking at comparison table that RNNs are not pre-trained they just use pre-trained GloVe vectors. Dataset was taken from Kaggle
Obviously the pre-trained model got better performance, therefore BERT showed the highest accuracy. LSTM with 2D MaxPooling layer, Bidirectional LSTM, LSTM with Attention showed approximately the same result, while usual LSTM was little bit worse.
Loss | Validation Accuracy | Recall | Precision | F1 | |
---|---|---|---|---|---|
Usual LSTM | 0.6616 | 0.8660 | 0.8190 | 0.8890 | 0.8526 |
Bidirectional LSTM | 0.6558 | 0.8697 | 0.8944 | 0.8432 | 0.8681 |
LSTM with Attantion | 0.6555 | 0.8698 | 0.9143 | 0.8188 | 0.8639 |
LSTM with 2D MaxPooling leyer | 0.6515 | 0.8719 | 0.8712 | 0.8649 | 0.8680 |
BERT | 0.3999 | 0.9092 | 0.8739 | 0.9113 | 0.8922 |
Model's weights you can find here
For searching hyperparameters such as learning rate, weight decay and hidden size of LSTMs was used HyperOpt Algorithm from Tune library. You can find more detail in this notebook.
Sometimes, it's important to see examples, where your model makes mistakes. It was implemented in ModelMistakes notebook and you can find some examples there.
After an attempt to do data cleanings such as lowercasing, noise removal, lemmatization, and stop-words removal was found that it made worse result. So it can mean that the dataset is good enough, and we can skip this step to save more information. This experiment you can find here
- Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling paper
- BERT Classifier: Just Another Pytorch Model article
- Algorithms for Hyper-Parameter Optimization paper
- Tune Search Algorithms docs
- Text-Classification-Pytorch repository