Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Entity linking with the Wikidata knowledge base

This is an accompanying repository for our *SEM 2018 paper (pre-print). It contains the code to replicate the experiments and train the models descirbed in the paper.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Please use the following citation:

@inproceedings{TUD-CS-2018-01,
    title = {Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories},
    author = {Sorokin, Daniil and Gurevych, Iryna},
    publisher = {Association for Computational Linguistics},
    booktitle = {Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018) },
    pages = {to appear},
    month = jun,
    year = {2018},
    location = {New Orleans, LA, U.S.}
}

Paper abstract:

The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity.

We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.

Please, refer to the paper for more the model description and training details

Contacts:

If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.

Daniil Sorokin, <lastname>@ukp.informatik.tu-darmstadt.de
https://www.ukp.tu-darmstadt.de
https://www.tu-darmstadt.de

Project structure:

File	Description
configs/	Configuration files for the experiments
entitylinking/core	Mention extraction and candidate retrieval
entitylinking/datasets	Datasets IO
entitylinking/evaluation	Evaluation measures and scripts
entitylinking/mlearning	Model definition and training scripts
entitylinking/wikidata	Retrieving information from Wikidata
resources/	Necessary resources
trainedmodels/	Trained models

Requirements:

Python 3.6
PyTorch 0.3.0 - read here about installation
See requirements.txt for the full list of packages

Running the experiments from the paper:

See run_experiments.sh

Using the pre-trained model:

Follow the steps to use this project as an external entity-linking tool.

Clone/Download the project
Take the pre-trained model FeatureModel_Baseline and extract it into a trainedmodels/ folder in the main directory of the project
Download the GloVe embeddings, glove.6B.zip and put them into the folder resources/glove/ in the main directory of the project
Modify the path to the word embeddings in the configuration file for the model: trainedmodels/FeatureModel_Baseline.param
Make sure that the project folder in your Python PATH
Use the following code to initialize an entity linker and apply it on new data:

from entitylinking import core
    
entitylinker = core.MLLinker(path_to_model="trainedmodels/FeatureModel_Baseline.torchweights")
output = entitylinker.link_entities_in_raw_input("Barack Obama is a president.")
print(output.entities)

For the VCG model you also need KB embeddings produced by Fast-TransX. We will make available a pre-trained version of these embeddings upon the publication.

License:

Apache License Version 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
entitylinking		entitylinking
resources		resources
tests		tests
trainedmodels		trainedmodels
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
requirements.txt		requirements.txt
run_experiments.sh		run_experiments.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Entity linking with the Wikidata knowledge base

Paper abstract:

Contacts:

Project structure:

Requirements:

Running the experiments from the paper:

Using the pre-trained model:

License:

About

Releases

Packages

Languages

License

daniilsorokin/starsem2018-entity-linking

Folders and files

Latest commit

History

Repository files navigation

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Entity linking with the Wikidata knowledge base

Paper abstract:

Contacts:

Project structure:

Requirements:

Running the experiments from the paper:

Using the pre-trained model:

License:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages