Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/PromptKG.git
Step2 Create a virtual environment using Anaconda
and enter it.
conda create -n lambdakg python=3.8
conda activate lambdakg
Step3 Enter the task directory and install library
cd PrompKG/lambdaKG
pip install -r requirements.txt
Install our preprocessed datasets and put them into the dataset
folder.
Dataset (KGC) | Google Drive | Baidu Cloud |
---|---|---|
WN18RR | google drive | baidu drive axo7 |
FB15k-237 | google drive | baidu drive ju9t |
MetaQA | google drive | baidu drive hzc9 |
KG20C | google drive | baidu drive stnh |
CSKB | google drive | baidu drive endu |
We provide four tasks in our toolkit as Knowledgeg Graph Completion (KGC), Question Answering (QA), Recomandation (REC) and LAnguage Model Analysis (LAMA).
-
KGC
is our basic task to the knowledge graph embedding and evaluate the ability of the models. ** You can run the script underkgc
folder to train the model and get the KG embeddings (takesimkgc
as example).bash ./scripts/kgc/simkgc.sh
-
For
QA
task, you can run the script files undermetaqa
. ** We suggest you use generative model to solve theQA
task as below:bash ./scripts/metaqa/run.sh
-
For
REC
task, you need to firstly get the KG embeddings and then train the rec system models. ** use two-stage scripts below:bash ./scripts/kgrec/pretrain_item.sh bash ./scripts/kgrec/ml20m.sh
-
For
LAMA
task, you can use the files underlama
. ** We provideBERT
andRoBERTa
PLMs to evaluate their performance and with our KG embeddings (plet).bash ./scripts/lama/lama_roberta.sh
Models | Knowledge Graph Completion | Question Answering | Recomandation | LAnguage Model Analysis |
---|---|---|---|---|
KG-BERT | ✔ | ✔ | ||
GenKGC | ✔ | |||
KGT5 | ✔ | ✔ | ||
kNN-KGE | ✔ | ✔ | ✔ | |
SimKGC | ✔ |
For each knowledge graph, we have 5 files.
train.tsv
,dev.tsv
,test.tsv
, list as (h, r, t) for entity id and relation id (start from 0).entity2text.txt
, as (entity_id, entity description).relation2text.txt
, as (relation_id, relation description).
- use text-davince-003 for KGC (Link Prediction)
Before running, please check that you already have the MidRes.json
available in dataset, or you may run python ./LLM/create_midres.py
to generate it.
And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
in ./LLM/gpt3kgc.py
with your own openai.api_key.
Once it is available, you can proceed to run the code under the LLM
folder.
python ./LLM/gpt3kgc.py
- use text-davince-001/002/003 for commonsense reasoning
Before running, please check that you already have downloaded the ATOMIC2020 dataset to dataset/atomic_2020_data
and have the file dataset/atomic_2020_data/test.jsonl
available in dataset, or you may run python LLM/atomic2020_process.py
to generate it.
And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
in ./LLM/atomic2020_res.py
with your own openai.api_key.
Once it is available, you can proceed to run the following code:
python LLM/atomic2020_res.py
After running, result file test_result.json
is available under fold dataset/atomic_2020_data
.
For help or issues using the models, please submit a GitHub issue.
If you use the code, please cite the following paper:
@article{DBLP:journals/corr/abs-2210-00305,
author = {Xin Xie and
Zhoubo Li and
Xiaohan Wang and
Shumin Deng and
Feiyu Xiong and
Huajun Chen and
Ningyu Zhang},
title = {PromptKG: {A} Prompt Learning Framework for Knowledge Graph Representation
Learning and Application},
journal = {CoRR},
volume = {abs/2210.00305},
year = {2022},
url = {https://doi.org/10.48550/arXiv.2210.00305},
doi = {10.48550/arXiv.2210.00305},
eprinttype = {arXiv},
eprint = {2210.00305},
timestamp = {Fri, 07 Oct 2022 15:24:59 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2210-00305.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}