Skip to content

Exploring mixup strategies for text classification

License

Notifications You must be signed in to change notification settings

cperiz/mixup-text

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mixup-text

This repository contains implementation of mixup strategy for text classification. The implementation is primarily based on the paper Augmenting Data with Mixup for Sentence Classification: An Empirical Study , although there is some difference.

Three variants of mixup are considered for text classification

  1. Embedding mixup: Texts are mixed immediately after word embeedding
  2. Hidden/Encoder mixup: Mixup is done prior to the last fully connected layer
  3. Sentence mixup: Mixup is done before softmax

Results

Some experimental results on TREC, SST-1, IMDB, AG's News and DBPedia datasets. rand referes to models initialized randomly. finetune is models initialized with pretrained word vector (GloVe or BERT).

Model TREC SST-1 IMDB AG's News DBPedia
CNN-rand 88.58 37.00 86.74 91.07 98.03
CNN-rand + embed mixup 88.38 35.93 87.34 91.67 97.85
CNN-rand + hidden mixup 88.78 35.24 87.06 91.49 98.34
CNN-rand + sent mixup 88.92 35.40 87.25 91.46 98.23
CNN-finetune 90.50 46.38 88.57 92.67 98.81
CNN-finetune + embed mixup 91.62 45.81 89.13 92.78 98.55
CNN-finetune + hidden-mixup 91.74 45.70 89.66 93.11 98.83
CNN-fine-tune + sent mixup 91.70 46.10 89.60 93.12 98.83
LSTM-finetune 89.26 44.38 86.04 92.87 98.95
LSTM-finetune + embed mixup 89.82 44.04 85.82 92.76 98.98
LSTM finetune + hidden mixup 89.72 43.87 85.23 92.67 98.92
LSTM finetune + sent mixup 89.70 43.86 85.02 92.65 98.87
fastText-finetune 86.88 43.26 88.33 91.93 97.85
fastText-finetune + mixup 86.2 43.81 88.05 91.99 97.99
BERT-finetune 97.04 53.05 - - -
BERT-finetune + embed mixup 97.20 53.12 - - -
BERT-finetune + hidden mixup 96.92 53.13 - - -
BERT-finetune + sent mixup 96.86 53.32 - - -

Results are mean accuracy of 10 runs for all datasets, except for DBPedia where it is average of 3 runs. Note that for fastText model there is only one variant of mixup as it is a linear model.

TO-DO

  • Manifold mixup implementation
  • Result for BERT on IMDB, AG's News and DBPedia datasets

About

Exploring mixup strategies for text classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.9%
  • Shell 3.1%