CATS

Commonsense Ability Tests

Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models

Use making_sense.py to run the experiments:
For ordinary tests:
python making_sense.py ca bert nr

For robust tests:
python making_sense.py ca bert r

Note that ca is the name of the task and bert is the model we are using. The default model is bert-base-uncased. To use bert-large, just modify the from_pretrained('bert-base-uncased') in the code. For more details, see Huggingface Transformers.

Due to the updating of Huggingface scripts and some of our datasets, some numbers we showed in the paper may not exactly match the what you might get by rerunning the experiments. However, the conclusion should be the same.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Robust_commonsense_test		Robust_commonsense_test
commonsense_ability_test		commonsense_ability_test
LICENSE		LICENSE
README.md		README.md
extract.py		extract.py
general.sh		general.sh
making_sense.py		making_sense.py
robust.sh		robust.sh
sentence_scoring.py		sentence_scoring.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CATS

About

Releases

Packages

Languages

License

XuhuiZhou/CATS

Folders and files

Latest commit

History

Repository files navigation

CATS

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages