OpenCSR/drfact_data/knowledge_corpus at main · yuchenlin/OpenCSR

History

Name		Name	Last commit message	Last commit date
parent directory ..
preprocessing_scripts		preprocessing_scripts
README.md		README.md
gkb_best.drfact_format.jsonl.zip		gkb_best.drfact_format.jsonl.zip
gkb_best.prepro.tsv		gkb_best.prepro.tsv
gkb_best.vocab.txt		gkb_best.vocab.txt

README.md

The Commonsense Knowledge Corpus

Download the Preprocessed Corpus; Or you can unzip the gkb_best.drfact_format.jsonl.zip here.

We use the GenericsKB as our knowledge corpus. We preprocess the corpus, extract the frequent concepts, and finally link the facts to their mentioned concepts. The link contains two files:

gkb_best.drfact_format.jsonl consists of generics commonsense facts, each of which is a statement of common knowledge. (See an example below.)
gkb_best.vocab.txt is the vocabulary of the concepts (i.e., noun chunks) sorted by their frequency.

Below is an example json line for the fact: "Trees remove dust and pollution from the air."

{
    "id": "gkb-best#934338", "url": "gkb-best#934338",
    "context": "Trees remove dust and pollution from the air .",
    "mentions": [
        { "kb_id": "tree", "start": 0, "text": "Trees", "sent_id": 0 },
        { "kb_id": "dust", "start": 13, "text": "dust", "sent_id": 0 },
        { "kb_id": "pollution", "start": 22, "text": "pollution", "sent_id": 0 },
        { "kb_id": "air", "start": 41, "text": "air", "sent_id": 0 }
    ],
    "title": "Trees", "kb_id": "tree"
}

Please also cite the GenericsKB paper if you use the data here.

@article{Bhakthavatsalam2020GenericsKBAK,
  title={GenericsKB: A Knowledge Base of Generic Statements},
  author={Sumithra Bhakthavatsalam and Chloe Anastasiades and P. Clark},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.00660}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knowledge_corpus

knowledge_corpus

README.md

The Commonsense Knowledge Corpus

Files

knowledge_corpus

Directory actions

More options

Directory actions

More options

Latest commit

History

knowledge_corpus

Folders and files

parent directory

README.md

The Commonsense Knowledge Corpus