A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings.
Overview • Installation • How To Use • Paper • Medium • Citation • Others
Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present.
We present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing).
LambdaKG is now publicly open-sourced, with a demo video and long-term maintenance.
- ❗Please note that this project is still undergoing optimization, and the codes will be updated to support new features and models!
Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/PromptKG.git
Step2 Create a virtual environment using Anaconda
and enter it.
conda create -n lambdakg python=3.8
conda activate lambdakg
Step3 Enter the task directory and install library
cd PrompKG/lambdaKG
pip install -r requirements.txt
Install our preprocessed datasets and put them into the dataset
folder.
Dataset (KGC) | Google Drive | Baidu Cloud |
---|---|---|
WN18RR | google drive | baidu drive axo7 |
FB15k-237 | google drive | baidu drive ju9t |
MetaQA | google drive | baidu drive hzc9 |
KG20C | google drive | baidu drive stnh |
CSKB | google drive | baidu drive endu |
ML20M | google drive | baidu drive 2icu |
We provide four tasks in our toolkit as Knowledgeg Graph Completion (KGC), Question Answering (QA), Recomandation (REC) and LAnguage Model Analysis (LAMA).
-
KGC
is our basic task to the knowledge graph embedding and evaluate the ability of the models. ** You can run the script underkgc
folder to train the model and get the KG embeddings (takesimkgc
as example).bash ./scripts/kgc/simkgc.sh
-
For
QA
task, you can run the script files undermetaqa
. ** We suggest you use generative model to solve theQA
task as below:bash ./scripts/metaqa/run.sh
-
For
REC
task, you need to firstly get the KG embeddings and then train the rec system models. ** use two-stage scripts below:bash ./scripts/kgrec/pretrain_item.sh bash ./scripts/kgrec/ml20m.sh
-
For
LAMA
task, you can use the files underlama
. ** We provideBERT
andRoBERTa
PLMs to evaluate their performance and with our KG embeddings (plet).bash ./scripts/lama/lama_roberta.sh
Models | Knowledge Graph Completion | Question Answering | Recomandation | LAnguage Model Analysis |
---|---|---|---|---|
KG-BERT | ✔ | ✔ | ||
GenKGC | ✔ | |||
KGT5 | ✔ | ✔ | ||
kNN-KGE | ✔ | ✔ | ✔ | |
SimKGC | ✔ |
For each knowledge graph, we have 5 files.
train.tsv
,dev.tsv
,test.tsv
, list as (h, r, t) for entity id and relation id (start from 0).entity2text.txt
, as (entity_id, entity description).relation2text.txt
, as (relation_id, relation description).
- use text-davince-003 for KGC (Link Prediction)
Before running, please check that you already have the MidRes.json
available in dataset, or you may run python ./LLM/create_midres.py
to generate it.
And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
in ./LLM/gpt3kgc.py
with your own openai.api_key.
Once it is available, you can proceed to run the code under the LLM
folder.
python ./LLM/gpt3kgc.py
- use text-davince-001/002/003 for commonsense reasoning
Before running, please check that you already have downloaded the ATOMIC2020 dataset to dataset/atomic_2020_data
and have the file dataset/atomic_2020_data/test.jsonl
available in dataset, or you may run python LLM/atomic2020_process.py
to generate it.
And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
in ./LLM/atomic2020_res.py
with your own openai.api_key.
Once it is available, you can proceed to run the following code:
python LLM/atomic2020_res.py
After running, result file test_result.json
is available under fold dataset/atomic_2020_data
.
For help or issues using the models, please submit a GitHub issue.
All results are based on the current environment and existing scripts. Your experimental outcomes may exhibit slight deviations from ours.
task | dataset | method | hist1 | mrr |
---|---|---|---|---|
kgc | WN18RR | simkgc | 42.5 | 60.8 |
kgc | WN18RR | kgt5 | 17.5 | --- |
kgc | WN18RR | genkgc | 32.5 | --- |
kgc | WN18RR | knnkge | 52.4 | 57.9 |
kgc | FB15k | simkgc | 22.8 | 30.0 |
kgc | FB15k | kgt5 | 11.0 | --- |
kgc | FB15k | genkgc | 19.1 | --- |
kgc | FB15k | knnkge | 28.1 | 37.3 |
qa | metaqa | kgt5 | 67.0 | --- |
rec | ml20m | bert | 34.4 | 47.9 |
lama | goole_re | bert | 11.2 | 18.2 |
lama | goole_re | robert | 7.7 | 12.5 |
If you use the code, please cite the following paper:
@article{DBLP:journals/corr/abs-2210-00305,
author = {Xin Xie and
Zhoubo Li and
Xiaohan Wang and
Yuqi Zhu and
Ningyu Zhang and
Jintian Zhang and
Siyuan Cheng and
Bozhong Tian and
Shumin Deng and
Feiyu Xiong and
Huajun Chen},
title = {LambdaKG: {A} Library for Pre-trained Language Model-Based Knowledge
Graph Embeddings},
journal = {CoRR},
volume = {abs/2210.00305},
year = {2022},
url = {https://doi.org/10.48550/arXiv.2210.00305},
doi = {10.48550/arXiv.2210.00305},
eprinttype = {arXiv},
eprint = {2210.00305},
timestamp = {Mon, 17 Apr 2023 12:58:49 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2210-00305.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}