S.No | Application / Downstream Task | Model to be tried | Metrics |
---|---|---|---|
1. | Text Classification | BERT | Accuracy Score = 0.9174 F1 Score (Micro) = 0.7785 F1 Score (Macro) = 0.6330 |
2. | DistillBERT | Accuracy Score = 0.9263 F1 Score (Micro) = 0.7877 F1 Score (Macro) = 0.6431 | |
3. | ALBERT | Accuracy Score = 0.9217 F1 Score (Micro) = 0.7431 F1 Score (Macro) = 0.4875 | |
4. | RoBERTa | Accuracy Score = 0.9254 F1 Score (Micro) = 0.7502 F1 Score (Macro) = 0.5614 | |
5. | BART | Accuracy Score = 0.9229 F1 Score (Micro) = 0.7787 F1 Score (Macro) = 0.6379 | |
6. | Token Classification | BERT | {'eval_loss': 0.2619, ` `'eval_precision': 0.5926, 'eval_recall': 0.3438, ` `'eval_f1': 0.4351, ` `'eval_accuracy': 0.9416, 'eval_runtime': 4.3885, ` `'eval_samples_per_second': 293.263, ` `'eval_steps_per_second': 18.457, 'epoch': 2.0} |
7. | DistillBERT | {'eval_loss': 0.2781, ` `'eval_precision': 0.5695, ` `'eval_recall': 0.2808, ` `'eval_f1': 0.3761, ` `'eval_accuracy': 0.9399, ` `'eval_runtime': 2.4784, ` `'eval_samples_per_second': 519.293, ` `'eval_steps_per_second': 32.683, 'epoch': 2.0} | |
8. | ALBERT | {'eval_loss': 0.2514, ` `'eval_precision': 0.5820, ` `'eval_recall': 0.3549, ` `'eval_f1': 0.4409, ` `'eval_accuracy': 0.9468, ` `'eval_runtime': 70.2674, ` `'eval_samples_per_second': 18.316, ` `'eval_steps_per_second': 1.153, 'epoch': 2.0} | |
9. | RoBERTa | {'eval_loss': 0.2272, ` `'eval_precision': 0.5585, 'eval_recall': 0.4819, ` `'eval_f1': 0.5174, ` `'eval_accuracy': 0.9493, 'eval_runtime': 18.2888, |
` `'eval_samples_per_second': 70.371, ` `'eval_steps_per_second': 4.429, 'epoch': 2.0} | |||
---|---|---|---|
10. | Question Answering | BERT | global_step=750, training_loss=1.7336, metrics={'train_runtime': 10433.0395, 'train_samples_per_second': 1.15, 'train_steps_per_second': 0.072, 'total_flos': 2351670810624000.0, 'train_loss': 1.7336, 'epoch': 3.0 |
11. | RoBERTa | global_step=750, training_loss=0.6353, metrics={'train_runtime': 1012.5891, 'train_samples_per_second': 11.851, 'train_steps_per_second': 0.741, 'total_flos': 2351670810624000.0, 'train_loss': 0.6353, 'epoch': 3.0 | |
12. | T5 | global_step=750, training_loss=4.4234, metrics={'train_runtime': 515.4332, 'train_samples_per_second': 23.281, 'train_steps_per_second': 1.455, 'total_flos': 2351670810624000.0, 'train_loss': 4.4234, 'epoch': 3.0 | |
13. | BigBird (for Extra Work) | global_step=750, training_loss=1.5889, metrics={'train_runtime': 13538.7056, 'train_samples_per_second': 0.886, 'train_steps_per_second': 0.055, 'total_flos': 2482279077888000.0, 'train_loss': 1.5889, 'epoch': 3.0 | |
14. | Longformer (for Extra work) | global_step=8544, training_loss=0.5953, metrics={'train_runtime': 3147.1509, 'train_samples_per_second': 2.715, 'train_steps_per_second': 2.715, 'total_flos': 2092926538137600.0, 'train_loss': 0.5953, 'epoch': 3.0 | |
15. | Summarization | T5 | global_step=1980, training_loss=2.6152, metrics={'train_runtime': 484.1179, 'train_samples_per_second': 8.172, 'train_steps_per_second': 4.09, 'total_flos': 1070812702310400.0, 'train_loss': 2.6152, 'epoch': 4.0 |
16. | Pegasus | global_step=1980, training_loss=1.8268, metrics={'train_runtime': 5100.2761, 'train_samples_per_second': 0.776, 'train_steps_per_second': 0.388, 'total_flos': | |
1\.142921441206272e+16, 'train_loss': 1.8268, 'epoch': 4.0 | |||
17. | BigBirdPegasus (for Extra Work) | global_step=3956, training_loss=3.3216, metrics={'train_runtime': 10138.6025, 'train_samples_per_second': 0.39, 'train_steps_per_second': 0.39, 'total_flos': 1.1379741713596416e+16, 'train_loss': 3.3216, 'epoch': 4.0 | |
18. | Translation | T5 | global_step=12710, training_loss=1.4878, metrics={'train_runtime': 6596.4761, 'train_samples_per_second': 30.825, 'train_steps_per_second': 1.927, 'total_flos': 2.245214421540864e+16, 'train_loss': 1.4878, 'epoch': 2.0 |
Datasets:
Text Classification: Jigsaw dataset: https://www.kaggle.com/competitions/jigsaw- toxic-comment-classification-challenge/data
Token Classification: wnut dataset from datasets library(wnut = load_dataset("wnut_17"))
Q&A: squad dataset from datasets library (squad = load_dataset("squad", split="train[:5000]"))
Summarization: billsum dataset from datasets library (billsum = load_dataset("billsum", split="ca_test"))
Translation: opus books from datasets library (books = load_dataset("opus_books", "en-fr"))
In text classification RoBERTa performed the best among the BERT variants, the trend is followed in token classification and Q&A too.
Pegasus performed well in the text summarization task when compared to the other models.