Skip to content

A PyTorch library for Knowledge Graph Embedding on Graphcore IPUs implementing the distribution framework BESS

License

Notifications You must be signed in to change notification settings

martytom/bess-kge

 
 

Repository files navigation

BESS-KGE

Continuous integration

Installation guide | Tutorials | Documentation

BESS-KGE is a PyTorch library for knowledge graph embedding (KGE) models on IPUs implementing the distribution framework BESS, with embedding tables stored in the IPU SRAM.

Features and limitations

Shallow KGE models are typically memory-bound, as little compute needs to be performed to score (h,r,t) triples once the embeddings of entities and relation types used in the batch have been retrieved. BESS (Balanced Entity Sampling and Sharing) is a KGE distribution framework designed to maximize bandwidth for gathering embeddings, by:

  • storing them in fast-access IPU on-chip memory;
  • minimizing communication time for sharing embeddings between workers, leveraging balanced collective operators over high-bandwidth IPU-links.

This allows BESS-KGE to achieve high throughput for both training and inference.

BESS overview

When distributing the workload over $n$ workers (=IPUs), BESS randomly splits the entity embedding table into $n$ shards of equal size, each of which is stored in a worker's memory. The embedding table for relation types, on the other hand, is replicated across workers, as it is usually much smaller.

Figure 1. Entity table sharding across $n=3$ workers.

The entity sharding induces a partitioning of the triples in the dataset, according to the shard-pair of the head entity and the tail entity. At execution time (for both training and inference), batches are constructed by sampling triples uniformly from each of the $n^2$ shard-pairs. Negative entities, used to corrupt the head or tail of a triple to construct negative samples, are also sampled in a balanced way to ensure a variety that is beneficial to the final embedding quality.

Figure 2. Left: A batch is made of $n^2=9$ blocks, each containing the same number of triples. The head embeddings of triples in block $(i,j)$ are stored on worker $i$, the tail embeddings on worker $j$, for $i,j = 0,1,2$. Right: The negative entities used to corrupt triples in block $(i,j)$ are sampled in equal numbers from all of the $n$ shards. In this example, negative samples are constructed by corrupting tails.

This batching scheme allows us to balance workload and communication across workers. First, each worker needs to gather the same number of embeddings from its on-chip memory, both for positive and negative samples. These include the embeddings needed by the worker itself, and the embeddings needed by its peers.

Figure 3. The required embeddings are gathered from the IPUs' SRAM. Each worker needs to retrieve the head embeddings for $n$ positive triple blocks, and the same for tail embeddings (the $3 + 3$ triangles of same colour in Figure 2 (left)). In addition to that, the worker gathers the portion (= $1/n$) stored in its memory of the negative tails needed by all of the $n^2$ blocks.

The batch in Figure 2 can then be reconstructed by sharing the embeddings of positive tails and negative entities between workers through a balanced AllToAll collective operator. Head embeddings remain in place, as each triple block is then scored on the worker where the head embedding is stored.

Figure 4. Embeddings of positive and negative tails are exchanged between workers with an AllToAll collective (red arrows), which effectively transposes rows and columns of the $n^2$ blocks in the picture. After this exchange, each worker (vertical column) has the embeddings of the correct $n$ blocks of positive triples and $n$ blocks of negative tails to compute positive and negative scores.

Additional variations of the distribution scheme are detailed in the BESS-KGE documentation.

Modules

All APIs are documented in the BESS-KGE API documentation.

Known limitations

  • BESS-KGE supports distribution for up to 16 IPUs.
  • Storing embeddings in SRAM introduces limitations on the size of the embedding tables, and therefore on the entity count in the knowledge graph. Some (approximate) estimates for these limitations are given in the table below (assuming FP16 for weights and FP32 for gradient accumulation and second order momentum). Notice that the cap will also depend on the batch size and the number of negative samples used.
Embeddings Optimizer Gradient
accumulation
Max number of entities
(# embedding parameters) on
size dtype IPU-POD4 IPU-POD16
100 float16 SGDM No 3.2M (3.2e8) 13M (1.3e9)
128 float16 Adam No 2.4M (3.0e8) 9.9M (1.3e9)
256 float16 SGDM Yes 900K (2.3e8) 3.5M (9.0e8)
256 float16 Adam No 1.2M (3.0e8) 4.8M (1.2e9)
512 float16 Adam Yes 375K (1.9e8) 1.5M (7.7e8)

If you get an error message during compilation about the ONNX protobuffer exceeding the maximum size, we recommend saving weights to a file using the poptorch.Options API options._Popart.set("saveInitializersToFile", "my_file.onnx").

Usage

Tested on Poplar SDK 3.3.0+1403, Ubuntu 20.04, Python 3.8

1. Install the Poplar SDK following the instructions in the Getting Started guide for your IPU system.

2. Enable the Poplar SDK, create and activate a Python virtualenv and install the PopTorch wheel:

source <path to Poplar installation>/enable.sh
source <path to PopART installation>/enable.sh
python3.8 -m venv .venv
source .venv/bin/activate
pip install wheel
pip install $POPLAR_SDK_ENABLED/../poptorch-*.whl

More details are given in the PyTorch quick start guide.

3. Pip install BESS-KGE:

pip install git+https://github.com/graphcore-research/bess-kge.git

4. Import and use:

import besskge

Paperspace notebook tutorials

For a walkthrough of the besskge library functionalities, see our Jupyter notebooks. We recommend the following sequence:

  1. KGE training and inference on the OGBL-BioKG dataset Run on Gradient
  2. Link prediction on the YAGO3-10 dataset Run on Gradient
  3. FP16 weights and compute on the OGBL-WikiKG2 dataset Run on Gradient

Contributing

You can contribute to the BESS-KGE project. See How to contribute to the BESS-KGE project

References

BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion (arXiv)

License

Copyright (c) 2023 Graphcore Ltd. Licensed under the MIT License.

The included code is released under the MIT license, (see details of the license).

See notices for dependencies, credits, derived work and further details.

About

A PyTorch library for Knowledge Graph Embedding on Graphcore IPUs implementing the distribution framework BESS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 52.0%
  • Python 47.1%
  • Other 0.9%