BESS-KGE

Installation guide | Tutorials | Documentation

BESS-KGE is a PyTorch library for knowledge graph embedding (KGE) models on IPUs implementing the distribution framework BESS, with embedding tables stored in the IPU SRAM.

Features and limitations

Shallow KGE models are typically memory-bound, as little compute needs to be performed to score (h,r,t) triples once the embeddings of entities and relation types used in the batch have been retrieved. BESS (Balanced Entity Sampling and Sharing) is a KGE distribution framework designed to maximize bandwidth for gathering embeddings, by:

storing them in fast-access IPU on-chip memory;
minimizing communication time for sharing embeddings between workers, leveraging balanced collective operators over high-bandwidth IPU-links.

This allows BESS-KGE to achieve high throughput for both training and inference.

BESS overview

When distributing the workload over $n$ workers (=IPUs), BESS randomly splits the entity embedding table into $n$ shards of equal size, each of which is stored in a worker's memory. The embedding table for relation types, on the other hand, is replicated across workers, as it is usually much smaller.

Figure 1. Entity table sharding across $n=3$ workers.

The entity sharding induces a partitioning of the triples in the dataset, according to the shard-pair of the head entity and the tail entity. At execution time (for both training and inference), batches are constructed by sampling triples uniformly from each of the $n^2$ shard-pairs. Negative entities, used to corrupt the head or tail of a triple to construct negative samples, are also sampled in a balanced way to ensure a variety that is beneficial to the final embedding quality.

Figure 2. Left: A batch is made of $n^2=9$ blocks, each containing the same number of triples. The head embeddings of triples in block $(i,j)$ are stored on worker $i$, the tail embeddings on worker $j$, for $i,j = 0,1,2$. Right: The negative entities used to corrupt triples in block $(i,j)$ are sampled in equal numbers from all of the $n$ shards. In this example, negative samples are constructed by corrupting tails.

This batching scheme allows us to balance workload and communication across workers. First, each worker needs to gather the same number of embeddings from its on-chip memory, both for positive and negative samples. These include the embeddings needed by the worker itself, and the embeddings needed by its peers.

Figure 3. The required embeddings are gathered from the IPUs' SRAM. Each worker needs to retrieve the head embeddings for $n$ positive triple blocks, and the same for tail embeddings (the $3 + 3$ triangles of same colour in Figure 2 (left)). In addition to that, the worker gathers the portion (= $1/n$) stored in its memory of the negative tails needed by all of the $n^2$ blocks.

The batch in Figure 2 can then be reconstructed by sharing the embeddings of positive tails and negative entities between workers through a balanced AllToAll collective operator. Head embeddings remain in place, as each triple block is then scored on the worker where the head embedding is stored.

Figure 4. Embeddings of positive and negative tails are exchanged between workers with an AllToAll collective (red arrows), which effectively transposes rows and columns of the $n^2$ blocks in the picture. After this exchange, each worker (vertical column) has the embeddings of the correct $n$ blocks of positive triples and $n$ blocks of negative tails to compute positive and negative scores.

Additional variations of the distribution scheme are detailed in the BESS-KGE documentation.

Modules

All APIs are documented in the BESS-KGE API documentation.

Known limitations

BESS-KGE supports distribution for up to 16 IPUs.
Storing embeddings in SRAM introduces limitations on the size of the embedding tables, and therefore on the entity count in the knowledge graph. Some (approximate) estimates for these limitations are given in the table below (assuming FP16 for weights and FP32 for gradient accumulation and second order momentum). Notice that the cap will also depend on the batch size and the number of negative samples used.

Embeddings		Optimizer	Gradient accumulation	Max number of entities (# embedding parameters) on
size	dtype	Optimizer	Gradient accumulation	IPU-POD4	IPU-POD16
100	float16	SGDM	No	3.2M (3.2e8)	13M (1.3e9)
128	float16	Adam	No	2.4M (3.0e8)	9.9M (1.3e9)
256	float16	SGDM	Yes	900K (2.3e8)	3.5M (9.0e8)
256	float16	Adam	No	1.2M (3.0e8)	4.8M (1.2e9)
512	float16	Adam	Yes	375K (1.9e8)	1.5M (7.7e8)

If you get an error message during compilation about the ONNX protobuffer exceeding the maximum size, we recommend saving weights to a file using the poptorch.Options API options._Popart.set("saveInitializersToFile", "my_file.onnx").

Usage

Tested on Poplar SDK 3.3.0+1403, Ubuntu 20.04, Python 3.8

1. Install the Poplar SDK following the instructions in the Getting Started guide for your IPU system.

2. Enable the Poplar SDK, create and activate a Python virtualenv and install the PopTorch wheel:

source <path to Poplar installation>/enable.sh
source <path to PopART installation>/enable.sh
python3.8 -m venv .venv
source .venv/bin/activate
pip install wheel
pip install $POPLAR_SDK_ENABLED/../poptorch-*.whl

More details are given in the PyTorch quick start guide.

3. Pip install BESS-KGE:

pip install git+https://github.com/graphcore-research/bess-kge.git

4. Import and use:

import besskge

Paperspace notebook tutorials

For a walkthrough of the besskge library functionalities, see our Jupyter notebooks. We recommend the following sequence:

Contributing

You can contribute to the BESS-KGE project. See How to contribute to the BESS-KGE project

References

BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion (arXiv)

License

The included code is released under the MIT license, (see details of the license).

See notices for dependencies, credits, derived work and further details.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.github/workflows		.github/workflows
.gradient		.gradient
besskge		besskge
docs		docs
notebooks		notebooks
tests		tests
.devcontainer.dockerfile		.devcontainer.dockerfile
.devcontainer.json		.devcontainer.json
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE.md		NOTICE.md
README.md		README.md
dev		dev
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BESS-KGE

Features and limitations

BESS overview

Modules

Known limitations

Usage

Paperspace notebook tutorials

Contributing

References

License

About

Releases

Packages

Languages

License

martytom/bess-kge

Folders and files

Latest commit

History

Repository files navigation

BESS-KGE

Features and limitations

BESS overview

Modules

Known limitations

Usage

Paperspace notebook tutorials

Contributing

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages