A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings.

Overview • Installation • How To Use • Paper • Medium • Citation • Others

Overview

Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present.

We present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing).

LambdaKG is now publicly open-sourced, with a demo video and long-term maintenance.

❗Please note that this project is still undergoing optimization, and the codes will be updated to support new features and models!

Quick Start

Installation

Step1 Download the basic code

git clone --depth 1 https://github.com/zjunlp/PromptKG.git

Step2 Create a virtual environment using Anaconda and enter it.

conda create -n lambdakg python=3.8

conda activate lambdakg

Step3 Enter the task directory and install library

cd PrompKG/lambdaKG

pip install -r requirements.txt

Step4

Install our preprocessed datasets and put them into the dataset folder.

Dataset (KGC)	Google Drive	Baidu Cloud
WN18RR	google drive	baidu drive `axo7`
FB15k-237	google drive	baidu drive `ju9t`
MetaQA	google drive	baidu drive `hzc9`
KG20C	google drive	baidu drive `stnh`
CSKB	google drive	baidu drive `endu`
ML20M	google drive	baidu drive `2icu`

Run your first experiment

We provide four tasks in our toolkit as Knowledgeg Graph Completion (KGC), Question Answering (QA), Recomandation (REC) and LAnguage Model Analysis (LAMA).

KGC is our basic task to the knowledge graph embedding and evaluate the ability of the models. ** You can run the script under kgc folder to train the model and get the KG embeddings (take simkgc as example).
```
bash ./scripts/kgc/simkgc.sh
```
For QA task, you can run the script files under metaqa. ** We suggest you use generative model to solve the QA task as below:
```
bash ./scripts/metaqa/run.sh
```
For REC task, you need to firstly get the KG embeddings and then train the rec system models. ** use two-stage scripts below:
```
bash ./scripts/kgrec/pretrain_item.sh
bash ./scripts/kgrec/ml20m.sh
```
For LAMA task, you can use the files under lama. ** We provide BERT and RoBERTa PLMs to evaluate their performance and with our KG embeddings (plet).
```
bash ./scripts/lama/lama_roberta.sh
```

Implemented Models

Models	Knowledge Graph Completion	Question Answering	Recomandation	LAnguage Model Analysis
KG-BERT	✔	✔
GenKGC	✔
KGT5	✔	✔
kNN-KGE	✔		✔	✔
SimKGC	✔

Process on your own data

For each knowledge graph, we have 5 files.

train.tsv, dev.tsv, test.tsv, list as (h, r, t) for entity id and relation id (start from 0).
entity2text.txt, as (entity_id, entity description).
relation2text.txt , as (relation_id, relation description).

For downstream tasks

Using Large Langugae Models (LLMs)

use text-davince-003 for KGC (Link Prediction)

Before running, please check that you already have the MidRes.json available in dataset, or you may run python ./LLM/create_midres.py to generate it.

And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" in ./LLM/gpt3kgc.py with your own openai.api_key.

Once it is available, you can proceed to run the code under the LLM folder.

python ./LLM/gpt3kgc.py

use text-davince-001/002/003 for commonsense reasoning

Before running, please check that you already have downloaded the ATOMIC2020 dataset to dataset/atomic_2020_data and have the file dataset/atomic_2020_data/test.jsonl available in dataset, or you may run python LLM/atomic2020_process.py to generate it.

And modify the api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" in ./LLM/atomic2020_res.py with your own openai.api_key.

Once it is available, you can proceed to run the following code:

python LLM/atomic2020_res.py

After running, result file test_result.json is available under fold dataset/atomic_2020_data.

Contact Information

For help or issues using the models, please submit a GitHub issue.

Result

All results are based on the current environment and existing scripts. Your experimental outcomes may exhibit slight deviations from ours.

task	dataset	method	hist1	mrr
kgc	WN18RR	simkgc	42.5	60.8
kgc	WN18RR	kgt5	17.5	---
kgc	WN18RR	genkgc	32.5	---
kgc	WN18RR	knnkge	52.4	57.9
kgc	FB15k	simkgc	22.8	30.0
kgc	FB15k	kgt5	11.0	---
kgc	FB15k	genkgc	19.1	---
kgc	FB15k	knnkge	28.1	37.3
qa	metaqa	kgt5	67.0	---
rec	ml20m	bert	34.4	47.9
lama	goole_re	bert	11.2	18.2
lama	goole_re	robert	7.7	12.5

Citation

If you use the code, please cite the following paper:

@article{DBLP:journals/corr/abs-2210-00305,
  author       = {Xin Xie and
                  Zhoubo Li and
                  Xiaohan Wang and
                  Yuqi Zhu and
                  Ningyu Zhang and
                  Jintian Zhang and
                  Siyuan Cheng and
                  Bozhong Tian and
                  Shumin Deng and
                  Feiyu Xiong and
                  Huajun Chen},
  title        = {LambdaKG: {A} Library for Pre-trained Language Model-Based Knowledge
                  Graph Embeddings},
  journal      = {CoRR},
  volume       = {abs/2210.00305},
  year         = {2022},
  url          = {https://doi.org/10.48550/arXiv.2210.00305},
  doi          = {10.48550/arXiv.2210.00305},
  eprinttype    = {arXiv},
  eprint       = {2210.00305},
  timestamp    = {Mon, 17 Apr 2023 12:58:49 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2210-00305.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Other KG Representation Open-Source Projects

OpenKE
LibKGE
CogKGE
PyKEEN
GraphVite
Pykg2vec
PyG
CogDL
NeuralKG
KGxBoard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Overview

Quick Start

Installation

Step4

Run your first experiment

Implemented Models

Process on your own data

For downstream tasks

Using Large Langugae Models (LLMs)

Contact Information

Result

Citation

Other KG Representation Open-Source Projects

Files

README.md

Latest commit

History

README.md

File metadata and controls

Overview

Quick Start

Installation

Step4

Run your first experiment

Implemented Models

Process on your own data

For downstream tasks

Using Large Langugae Models (LLMs)

Contact Information

Result

Citation

Other KG Representation Open-Source Projects