SPBERT: A Pre-trained Model for SPARQL Query Language

In this project, we provide the code for reproducing the experiments in our paper. SPBERT is a BERT-based language model pre-trained on massive SPARQL query logs. SPBERT can learn general-purpose representations in both natural language and SPARQL query language and make the most of the sequential order of words that are crucial for structured language like SPARQL.

Prerequisites

To reproduce the experiment of our model, please install the requirements.txt according to the following instructions:

transformers==4.5.1
pytorch==1.8.1
python 3.7.10

$ pip install -r requirements.txt

Pre-trained models

We release three versions of pre-trained weights. Pre-training was based on the original BERT code provided by Google, and training details are described in our paper. You can download all versions from the table below:

Pre-training objective	Model	Steps	Link
MLM	SPBERT (scratch)	200k	🤗 razent/spbert-mlm-zero
MLM	SPBERT (BERT-initialized)	200k	🤗 razent/spbert-mlm-base
MLM+WSO	SPBERT (BERT-initialized)	200k	🤗 razent/spbert-mlm-wso-base

Datasets

All evaluation datasets can download here.

Example

To fine-tune models:

python run.py \
        --do_train \
        --do_eval \
        --model_type bert \
        --model_architecture bert2bert \
        --encoder_model_name_or_path bert-base-cased \
        --decoder_model_name_or_path sparql-mlm-zero \
        --source en \
        --target sparql \
        --train_filename ./LCQUAD/train \
        --dev_filename ./LCQUAD/dev \
        --output_dir ./ \
        --max_source_length 64 \
        --weight_decay 0.01 \
        --max_target_length 128 \
        --beam_size 10 \
        --train_batch_size 32 \
        --eval_batch_size 32 \
        --learning_rate 5e-5 \
        --save_inverval 10 \
        --num_train_epochs 150

To evaluate models:

python run.py \
        --do_test \
        --model_type bert \
        --model_architecture bert2bert \
        --encoder_model_name_or_path bert-base-cased \
        --decoder_model_name_or_path sparql-mlm-zero \
        --source en \
        --target sparql \
        --load_model_path ./checkpoint-best-bleu/pytorch_model.bin \
        --dev_filename ./LCQUAD/dev \
        --test_filename ./LCQUAD/test \
        --output_dir ./ \
        --max_source_length 64 \
        --max_target_length 128 \
        --beam_size 10 \
        --eval_batch_size 32 \

Contact

Email: [email protected] - Hieu Tran

Citation

@inproceedings{Tran2021SPBERTAE,
  title={SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs},
  author={Hieu Tran and Long Phan and James T. Anibal and Binh Thanh Nguyen and Truong-Son Nguyen},
  booktitle={ICONIP},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
fairseq_script		fairseq_script
notebooks		notebooks
.gitignore		.gitignore
BERT2SPBERT.drawio		BERT2SPBERT.drawio
NL2SPARQL Workflow		NL2SPARQL Workflow
PretrainingObjectives.drawio		PretrainingObjectives.drawio
README.md		README.md
generator_utils.py		generator_utils.py
model.py		model.py
moses.py		moses.py
preprocessing.py		preprocessing.py
preprocessing_multilang.py		preprocessing_multilang.py
requirements.txt		requirements.txt
run.py		run.py
seperate_en_sparql.py		seperate_en_sparql.py
seperate_multilang_sparql.py		seperate_multilang_sparql.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPBERT: A Pre-trained Model for SPARQL Query Language

Prerequisites

Pre-trained models

Datasets

Example

Contact

Citation

About

Releases

Packages

Languages

heraclex12/NLP2SPARQL

Folders and files

Latest commit

History

Repository files navigation

SPBERT: A Pre-trained Model for SPARQL Query Language

Prerequisites

Pre-trained models

Datasets

Example

Contact

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages