More details at https://textattack.readthedocs.io/en/latest/3recipes/models.html
TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for users to get started with TextAttack. It also enables a more fair comparison of attacks from the literature.
All evaluation results were obtained using textattack eval
to evaluate models on their default
test dataset (test set, if labels are available, otherwise, eval/validation set). You can use
this command to verify the accuracies for yourself: for example, textattack eval --model roberta-base-mr
.
The LSTM and wordCNN models' code is available in textattack.models.helpers
. All other models are transformers
imported from the transformers
package. To list evaluate all
TextAttack pretrained models, invoke textattack eval
without specifying a model: textattack eval --num-examples 1000
.
All evaluations shown are on the full validation or test set up to 1000 examples.
- AG News (
lstm-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 914/1000
- Accuracy: 91.4%
- IMDB (
lstm-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- Movie Reviews [Rotten Tomatoes] (
lstm-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 807/1000
- Accuracy: 80.70%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 781/1000
- Accuracy: 78.10%
- SST-2 (
lstm-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 737/872
- Accuracy: 84.52%
- Yelp Polarity (
lstm-yelp
)datasets
datasetyelp_polarity
, splittest
- Correct/Whole: 922/1000
- Accuracy: 92.20%
- AG News (
cnn-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 910/1000
- Accuracy: 91.00%
- IMDB (
cnn-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 863/1000
- Accuracy: 86.30%
- Movie Reviews [Rotten Tomatoes] (
cnn-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 794/1000
- Accuracy: 79.40%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 768/1000
- Accuracy: 76.80%
- SST-2 (
cnn-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 721/872
- Accuracy: 82.68%
- Yelp Polarity (
cnn-yelp
)datasets
datasetyelp_polarity
, splittest
- Correct/Whole: 913/1000
- Accuracy: 91.30%
- AG News (
albert-base-v2-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 943/1000
- Accuracy: 94.30%
- CoLA (
albert-base-v2-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 829/1000
- Accuracy: 82.90%
- IMDB (
albert-base-v2-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 913/1000
- Accuracy: 91.30%
- Movie Reviews [Rotten Tomatoes] (
albert-base-v2-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 882/1000
- Accuracy: 88.20%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 851/1000
- Accuracy: 85.10%
- Quora Question Pairs (
albert-base-v2-qqp
)datasets
datasetglue
, subsetqqp
, splitvalidation
- Correct/Whole: 914/1000
- Accuracy: 91.40%
- Recognizing Textual Entailment (
albert-base-v2-rte
)datasets
datasetglue
, subsetrte
, splitvalidation
- Correct/Whole: 211/277
- Accuracy: 76.17%
- SNLI (
albert-base-v2-snli
)datasets
datasetsnli
, splittest
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- SST-2 (
albert-base-v2-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 807/872
- Accuracy: 92.55%)
- STS-b (
albert-base-v2-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.9041359738552746
- Spearman correlation: 0.8995912861209745
- WNLI (
albert-base-v2-wnli
)datasets
datasetglue
, subsetwnli
, splitvalidation
- Correct/Whole: 42/71
- Accuracy: 59.15%
- Yelp Polarity (
albert-base-v2-yelp
)datasets
datasetyelp_polarity
, splittest
- Correct/Whole: 963/1000
- Accuracy: 96.30%
- AG News (
bert-base-uncased-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 942/1000
- Accuracy: 94.20%
- CoLA (
bert-base-uncased-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 812/1000
- Accuracy: 81.20%
- IMDB (
bert-base-uncased-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 919/1000
- Accuracy: 91.90%
- MNLI matched (
bert-base-uncased-mnli
)datasets
datasetglue
, subsetmnli
, splitvalidation_matched
- Correct/Whole: 840/1000
- Accuracy: 84.00%
- Movie Reviews [Rotten Tomatoes] (
bert-base-uncased-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 876/1000
- Accuracy: 87.60%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 838/1000
- Accuracy: 83.80%
- MRPC (
bert-base-uncased-mrpc
)datasets
datasetglue
, subsetmrpc
, splitvalidation
- Correct/Whole: 358/408
- Accuracy: 87.75%
- QNLI (
bert-base-uncased-qnli
)datasets
datasetglue
, subsetqnli
, splitvalidation
- Correct/Whole: 904/1000
- Accuracy: 90.40%
- Quora Question Pairs (
bert-base-uncased-qqp
)datasets
datasetglue
, subsetqqp
, splitvalidation
- Correct/Whole: 924/1000
- Accuracy: 92.40%
- Recognizing Textual Entailment (
bert-base-uncased-rte
)datasets
datasetglue
, subsetrte
, splitvalidation
- Correct/Whole: 201/277
- Accuracy: 72.56%
- SNLI (
bert-base-uncased-snli
)datasets
datasetsnli
, splittest
- Correct/Whole: 894/1000
- Accuracy: 89.40%
- SST-2 (
bert-base-uncased-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 806/872
- Accuracy: 92.43%)
- STS-b (
bert-base-uncased-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.8775458937815515
- Spearman correlation: 0.8773251339980935
- WNLI (
bert-base-uncased-wnli
)datasets
datasetglue
, subsetwnli
, splitvalidation
- Correct/Whole: 40/71
- Accuracy: 56.34%
- Yelp Polarity (
bert-base-uncased-yelp
)datasets
datasetyelp_polarity
, splittest
- Correct/Whole: 963/1000
- Accuracy: 96.30%
- CoLA (
distilbert-base-cased-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 786/1000
- Accuracy: 78.60%
- MRPC (
distilbert-base-cased-mrpc
)datasets
datasetglue
, subsetmrpc
, splitvalidation
- Correct/Whole: 320/408
- Accuracy: 78.43%
- Quora Question Pairs (
distilbert-base-cased-qqp
)datasets
datasetglue
, subsetqqp
, splitvalidation
- Correct/Whole: 908/1000
- Accuracy: 90.80%
- SNLI (
distilbert-base-cased-snli
)datasets
datasetsnli
, splittest
- Correct/Whole: 861/1000
- Accuracy: 86.10%
- SST-2 (
distilbert-base-cased-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 785/872
- Accuracy: 90.02%)
- STS-b (
distilbert-base-cased-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- AG News (
distilbert-base-uncased-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 944/1000
- Accuracy: 94.40%
- CoLA (
distilbert-base-uncased-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 786/1000
- Accuracy: 78.60%
- IMDB (
distilbert-base-uncased-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 903/1000
- Accuracy: 90.30%
- MNLI matched (
distilbert-base-uncased-mnli
)datasets
datasetglue
, subsetmnli
, splitvalidation_matched
- Correct/Whole: 817/1000
- Accuracy: 81.70%
- MRPC (
distilbert-base-uncased-mrpc
)datasets
datasetglue
, subsetmrpc
, splitvalidation
- Correct/Whole: 350/408
- Accuracy: 85.78%
- QNLI (
distilbert-base-uncased-qnli
)datasets
datasetglue
, subsetqnli
, splitvalidation
- Correct/Whole: 860/1000
- Accuracy: 86.00%
- Recognizing Textual Entailment (
distilbert-base-uncased-rte
)datasets
datasetglue
, subsetrte
, splitvalidation
- Correct/Whole: 180/277
- Accuracy: 64.98%
- STS-b (
distilbert-base-uncased-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- WNLI (
distilbert-base-uncased-wnli
)datasets
datasetglue
, subsetwnli
, splitvalidation
- Correct/Whole: 40/71
- Accuracy: 56.34%
- AG News (
roberta-base-ag-news
)datasets
datasetag_news
, splittest
- Correct/Whole: 947/1000
- Accuracy: 94.70%
- CoLA (
roberta-base-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 857/1000
- Accuracy: 85.70%
- IMDB (
roberta-base-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 941/1000
- Accuracy: 94.10%
- Movie Reviews [Rotten Tomatoes] (
roberta-base-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 899/1000
- Accuracy: 89.90%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- MRPC (
roberta-base-mrpc
)datasets
datasetglue
, subsetmrpc
, splitvalidation
- Correct/Whole: 371/408
- Accuracy: 91.18%
- QNLI (
roberta-base-qnli
)datasets
datasetglue
, subsetqnli
, splitvalidation
- Correct/Whole: 917/1000
- Accuracy: 91.70%
- Recognizing Textual Entailment (
roberta-base-rte
)datasets
datasetglue
, subsetrte
, splitvalidation
- Correct/Whole: 217/277
- Accuracy: 78.34%
- SST-2 (
roberta-base-sst2
)datasets
datasetglue
, subsetsst2
, splitvalidation
- Correct/Whole: 820/872
- Accuracy: 94.04%)
- STS-b (
roberta-base-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.906067852162708
- Spearman correlation: 0.9025045272903051
- WNLI (
roberta-base-wnli
)datasets
datasetglue
, subsetwnli
, splitvalidation
- Correct/Whole: 40/71
- Accuracy: 56.34%
- CoLA (
xlnet-base-cased-cola
)datasets
datasetglue
, subsetcola
, splitvalidation
- Correct/Whole: 800/1000
- Accuracy: 80.00%
- IMDB (
xlnet-base-cased-imdb
)datasets
datasetimdb
, splittest
- Correct/Whole: 957/1000
- Accuracy: 95.70%
- Movie Reviews [Rotten Tomatoes] (
xlnet-base-cased-mr
)datasets
datasetrotten_tomatoes
, splitvalidation
- Correct/Whole: 908/1000
- Accuracy: 90.80%
datasets
datasetrotten_tomatoes
, splittest
- Correct/Whole: 876/1000
- Accuracy: 87.60%
- MRPC (
xlnet-base-cased-mrpc
)datasets
datasetglue
, subsetmrpc
, splitvalidation
- Correct/Whole: 363/408
- Accuracy: 88.97%
- Recognizing Textual Entailment (
xlnet-base-cased-rte
)datasets
datasetglue
, subsetrte
, splitvalidation
- Correct/Whole: 196/277
- Accuracy: 70.76%
- STS-b (
xlnet-base-cased-stsb
)datasets
datasetglue
, subsetstsb
, splitvalidation
- Pearson correlation: 0.883111673280641
- Spearman correlation: 0.8773439961182335
- WNLI (
xlnet-base-cased-wnli
)datasets
datasetglue
, subsetwnli
, splitvalidation
- Correct/Whole: 41/71
- Accuracy: 57.75%