This repository contains the data and PyTorch implementation of the EMNLP 2021 paper NegatER: Generating Negatives in Commonsense Knowledge Bases by Mining Language Models by Tara Safavi, Jing Zhu, and Danai Koutra.
If you use our work, please cite us as follows:
@inproceedings{safavi-etal-2021-negater,
title = "{N}egat{ER}: {U}nsupervised {D}iscovery of {N}egatives in {C}ommonsense {K}nowledge {B}ases",
author = "Safavi, Tara and
Zhu, Jing and
Koutra, Danai",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.456",
pages = "5633--5646",
}
Run the following to set up your virtual environment and install the Python requirements:
python3.7 -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
Since NegatER makes use of the FAISS library, you also need to install libomp and libopenblas. On Ubuntu, run the following:
apt install libopenblas-base libomp-dev
On OS X, run the following:
brew install libomp openblas
The repository includes both datasets used in our experiments:
data/conceptnet/full
: The full ConceptNet benchmark from Li et al 2016, which comprises 100,000/2,400/2,400 train/validation/test triples across 34 relations and 78,334 unique phrases.data/conceptnet/true-neg
: The filtered ConceptNet dataset consisting of six relations that have true negative counterparts in the original benchmark. It comprises 36,210/3,278/3,278 train/validation/test triples across six relations and 41,528 unique phrases.
Each job (fine-tuning and/or negative generation) requires a YAML configuration file.
The file config-default.yaml
provides default configuration options across all jobs,
alongside explanations of each configuration key.
You can overwrite these options by creating your own config file.
To fine-tune a language model on a commonsense KB, use the src/fine_tune.py
script:
usage: fine_tune.py [-h] [--action {train,test} [{train,test} ...]]
[--config-file CONFIG_FILE]
config_dir
positional arguments:
config_dir Directory of job config file
optional arguments:
-h, --help show this help message and exit
--action {train,test} [{train,test} ...]
Default: ['train', 'test']
--config-file CONFIG_FILE
Configuration filename in the specified config
directory. Default: 'config.yaml'
Here are some examples of commands you can run:
- To fine-tune and test BERT-Base for on the full ConceptNet
dataset using our given (best) config file, run the following:
python src/fine_tune.py configs/conceptnet/full/classify/
- To test your fine-tuned BERT-Base on the full ConceptNet dataset, run the following:
python src/fine_tune.py configs/conceptnet/full/classify/ --action test
- To fine-tune RoBERTa-Base instead of BERT on the full ConceptNet dataset,
edit the
configs/conceptnet/full/classify/config.yaml
file as follows:then run the following:fine_tune: model: pretrained_name: roberta-base eval: test_checkpoint: roberta_best
python src/fine_tune.py configs/conceptnet/full/classify/
- To fine-tune BERT with negative samples generated by the UNIFORM baseline on the
filtered ConceptNet dataset, run the following:
python src/fine_tune.py configs/conceptnet/true-neg/uniform/
- To fine-tune BERT using the negative samples generated by NegatER-$\nabla$ on the filtered
ConceptNet dataset, run the following:
python src/fine_tune.py configs/conceptnet/true-neg/negater-gradients/
To generate negatives given a language model fine-tuned on a commonsense KB,
use the src/negater.py
script:
usage: negater.py [-h] [--type {thresholds,full-gradients,proxy-gradients}]
[--config-file CONFIG_FILE]
config_dir
positional arguments:
config_dir Directory of job config file
optional arguments:
-h, --help show this help message and exit
--type {thresholds,full-gradients,proxy-gradients}
Type of NegatER job to run: 'thresholds' for
NegatER-$\theta_r$, 'full-gradients' for
NegatER-$\nabla$ without the proxy, and 'proxy-
gradients' for NegatER-$\nabla$ with the proxy.
Default: 'thresholds'
--config-file CONFIG_FILE
Configuration filename in the specified config
directory.Default: 'config.yaml'
Usage examples:
- To generate negatives with NegatER-$\theta_r$ using our precomputed k-nearest-neighbors
index for k=10, run the following:
python src/negater.py configs/conceptnet/full/generate/ --type thresholds
- To generate negatives with NegatER-$\nabla$ + the proxy approach,
building and saving a new k-nearest-neighbors index for k=20,
first modify the
configs/conceptnet/full/generate/config.yaml
file as follows:then run the following:negater: index: build: True k: 20 save: True
python src/negater.py configs/conceptnet/full/generate/ --type proxy-gradients