Skip to content

Commit

Permalink
KGDataset.from_dataframe + custom data notebook (#25)
Browse files Browse the repository at this point in the history
* add from_dataframe build option in KGDataset

* mypy fix

* add tutorial notebook for custom datasets

* update CONTRIBUTING for forks

* add dataset disclaimer

* refactor dataset.py and README

* add openbiolink dataloader

* fix typo

* resize image

* fix link

* change dataset in notebook 0

* minor update

* fix typo

* Tech Docs review of notebook: Using BESS-KGE with your Own Data

* isort + fix typo

* restore download output

---------

Co-authored-by: Jaynie Padayachee <[email protected]>
Co-authored-by: JaynieP <[email protected]>
  • Loading branch information
3 people authored Sep 28, 2023
1 parent 2d4ed5f commit 443474c
Show file tree
Hide file tree
Showing 11 changed files with 1,788 additions and 349 deletions.
43 changes: 28 additions & 15 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,36 @@
# How to contribute to the BESS-KGE project

You can contribute to the development of the BESS-KGE project, even if you don't have access to IPUs (you can use the [IPUModel](https://docs.graphcore.ai/projects/poptorch-user-guide/en/3.2.0/reference.html#poptorch.Options.useIpuModel) to emulate most functionalities of the physical hardware).
![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)

You can contribute to the development of the BESS-KGE project, even if you don't have access to IPUs (you can use the [IPUModel](https://docs.graphcore.ai/projects/poptorch-user-guide/en/3.2.0/reference.html#poptorch.Options.useIpuModel) to emulate most functionalities of the physical hardware).

## VS Code server on Paperspace

Setting up a VS Code server on [Paperspace](https://www.paperspace.com/graphcore) will allow you to tunnel into a machine with IPUs from the VS Code web editor or the desktop app. This requires minimum effort and is an excellent solution for developing and testing code directly on IPU hardware.
Setting up a VS Code server on [Paperspace](https://www.paperspace.com/graphcore) will allow you to tunnel into a machine with IPUs from the VS Code web editor or the desktop app. This requires minimum effort and is an excellent solution for developing and testing code directly on IPU hardware. Here's how to do it.

You can launch a 6-hours session on a Paperspace machine with access to 4 IPUs **for free** by clicking on this button: <a href="https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&amp;machine=Free-IPU-POD4"><img src="https://assets.paperspace.io/img/gradient-badge.svg" alt="Run on Gradient"></a>
1. Fork the [BESS-KGE repository](https://github.com/graphcore-research/bess-kge).

Start the machine (this will also clone the repo for you) and open up a terminal from the left pane.
2. You can launch a 6-hours session on a Paperspace machine with access to 4 IPUs **for free** by using a link of the form:
```
https://console.paperspace.com/github/{USERID}/{REPONAME}?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&amp;machine=Free-IPU-POD4
```
![terminal_pane](docs/source/images/Terminal1.png "height=200")
where `{USERID}/{REPOPNAME}` is the github address of the forked repository (e.g. `graphcore-research/bess-kge` for the original repo).
In the terminal, run the command
```shell
bash .gradient/launch_vscode_server.sh {tunnel-name}
```
3. Start the machine (this will also clone the repo for you) and open up a terminal from the left pane.
![terminal_pane](docs/source/images/Terminal3.png)
where `tunnel-name` is an optional argument that you can use to define the name of the remote tunnel (if not set, it will default to `ipu-paperspace`).
4. In the terminal, run the command
```shell
bash .gradient/launch_vscode_server.sh {tunnel-name}
```
The script will download and install all dependencies and start the tunnel. You will be asked to authorize the tunnel through GitHub, before being provided with the tunnel link. Please refer to [this notebook](https://ipu.dev/fmo4AZ) for additional details on these steps and to connect the VS Code desktop app to the remote tunnel.
where `tunnel-name` is an optional argument that you can use to define the name of the remote tunnel (if not set, it will default to `ipu-paperspace`). The script will download and install all dependencies and start the tunnel.
Once VS Code is connected to the Paperspace machine, run `./dev build` to build all custom ops. You are now ready to create a new git branch and start developing!
5. When asked, authorize the tunnel through GitHub (with an account having writing privileges to the forked repository). You will be then provided with the tunnel link. Please refer to [this notebook](https://ipu.dev/fmo4AZ) for additional details on these steps and to connect the VS Code desktop app to the remote tunnel.
6. Once VS Code is connected to the Paperspace machine, run `./dev build` to build all custom ops. You are now ready to start developing!
When closing a session and stopping the Paperspace machine, remember to unregister the tunnel in VS Code as explained in the "Common Issues" paragraph of the [notebook](https://ipu.dev/fmo4AZ). To resume your work, just access the clone of the BESS-KGE repo in the "Projects" section of your Paperspace profile, start a new machine and repeat the operations above. All code changes to the local repo, as well as VS Code settings and extensions installed, will persist across sessions.
Expand All @@ -43,10 +52,14 @@ pip install $POPLAR_SDK_ENABLED/../poptorch-*.whl
pip install -r requirements-dev.txt
```

Finally, build all custom ops by running `./dev build`
Finally, clone your fork of the BESS-KGE repository and build all custom ops by running `./dev build`

## Development tips

The `./dev` command can be used to run several utility scripts during development. Check `./dev --help` for a list of dev options.

## Tips
Before submitting a PR to the upstream repo, use `./dev ci` to run all CI checks locally. In particular, be mindful of our formatting requirements: you can check for formatting errors by running `./dev format` and `./dev lint` (both commands are automatically run inside `./dev ci`).

Run `./dev --help` for a list of dev options. In particular, use `./dev ci` to run all CI checks locally. Run individual tests with pattern matching filtering `./dev tests -k FILTER`.
Add unit tests to the `tests` folder. You can run individual unit tests with pattern matching filtering `./dev tests -k FILTER`.

Add `.cpp` custom ops to `besskge/custom_ops`. Also, update the [Makefile](Makefile) when adding custom ops.
19 changes: 15 additions & 4 deletions NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@ Copyright (c) 2023 Graphcore Ltd. Licensed under the MIT License.

The included code is released under an MIT license, (see [LICENSE](LICENSE)).

The ogbl-biokg and ogbl-wikikg2 datasets are licensed under CC-0.

The [YAGO3 dataset](https://yago-knowledge.org/downloads/yago-3) by the [YAGO team](https://yago-knowledge.org/contributors) of the [Max-Planck Institute for Informatics](https://www.mpi-inf.mpg.de/home/) and [Telcom Paris](https://www.telecom-paris.fr/) is licensed under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).
## Dependencies

Our dependencies are (see [requirements.txt](requirements.txt)):

Expand All @@ -18,8 +16,21 @@ Our dependencies are (see [requirements.txt](requirements.txt)):

We also use additional Python dependencies for development/testing/documentation (see [requirements-dev.txt](requirements-dev.txt)).

## Dataset disclaimer

This repository provides dataloaders for third party datasets. The use of these datasets is at own risk and Graphcore offers no warranties of any kind. It is the user's responsibility to comply with all license requirements for datasets downloaded with dataloaders in this repository.

The tutorial notebooks make use of the following datasets:

* [ogbl-biokg](https://ogb.stanford.edu/docs/linkprop/#ogbl-biokg), licensed under CC-0;

* [ogbl-wikikg2](https://ogb.stanford.edu/docs/linkprop/#ogbl-wikikg2), licensed under CC-0;

* [YAGO3 dataset](https://yago-knowledge.org/downloads/yago-3) by the [YAGO team](https://yago-knowledge.org/contributors) of the [Max-Planck Institute for Informatics](https://www.mpi-inf.mpg.de/home/) and [Telcom Paris](https://www.telecom-paris.fr/), licensed under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).

## Derived work

**This directory includes derived work from the following:**
This directory includes derived work from the following:

---

Expand Down
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# BESS-KGE
![Continuous integration](https://github.com/graphcore-research/bess-kge/actions/workflows/ci.yaml/badge.svg)
![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)

[**Installation guide**](#usage)
| [**Tutorials**](#paperspace-notebook-tutorials)
Expand Down Expand Up @@ -77,6 +78,17 @@ Additional variations of the distribution scheme are detailed in the [BESS-KGE d

All APIs are documented in the [BESS-KGE API documentation](https://graphcore-research.github.io/bess-kge/API_reference.html).

### Datasets

BESS-KGE provides built-in dataloaders for the following datasets. Notice that the use of these datasets is at own risk and Graphcore offers no warranties of any kind. It is the user's responsibility to comply with all license requirements for datasets downloaded with dataloaders in this repository.

| Dataset | Builder method | Entities | Entity types | Relation types | Triples | License |
| --- | --- | --- | --- | --- | --- | --- |
| [ogbl-biokg](https://ogb.stanford.edu/docs/linkprop/#ogbl-biokg) | [KGDataset.build_ogbl_biokg](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_ogbl_biokg) | 93,773 | 5 | 51 | 5,088,434 | CC-0 |
| [ogbl-wikikg2](https://ogb.stanford.edu/docs/linkprop/#ogbl-wikikg2) | [KGDataset.build_ogbl_wikikg2](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_ogbl_wikikg2) | 2,500,604 | 1 | 535 | 16,968,094 | CC-0 |
| [YAGO3-10](https://yago-knowledge.org/downloads/yago-3) | [KGDataset.build_yago310](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_yago310) | 123,182 | 1 | 37 | 1,089,040 | CC BY 3.0 |
| [OpenBioLink2020](https://github.com/openbiolink/openbiolink#benchmark-dataset) | [KGDataset.build_openbiolink](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_openbiolink) | 184,635 | 7 | 28 | 4,563,405 | [link](https://github.com/openbiolink/openbiolink#Source-databases-and-their-licenses) |

### Known limitations

* BESS-KGE supports distribution for up to 16 IPUs.
Expand Down Expand Up @@ -178,10 +190,11 @@ For a walkthrough of the `besskge` library functionalities, see our Jupyter note
2. [Link prediction on the YAGO3-10 dataset](notebooks/2_yago_topk_prediction.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F2_yago_topk_prediction.ipynb)
3. [FP16 weights and compute on the OGBL-WikiKG2 dataset](notebooks/3_wikikg2_fp16.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F3_wikikg2_fp16.ipynb)

For pointers on how to run BESS-KGE on a custom Knowledge Graph dataset, see the notebook [Using BESS-KGE with your own data](notebooks/0_custom_KG_dataset.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F0_custom_KG_dataset.ipynb)

## Contributing

You can contribute to the BESS-KGE project. See [How to contribute to the BESS-KGE project](CONTRIBUTING.md)
You can contribute to the BESS-KGE project: PRs are welcome! For details, see [How to contribute to the BESS-KGE project](CONTRIBUTING.md).

## References
BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion ([arXiv](https://arxiv.org/abs/2211.12281))
Expand All @@ -190,6 +203,6 @@ BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Compl

Copyright (c) 2023 Graphcore Ltd. Licensed under the MIT License.

The included code is released under the MIT license, (see [details of the license](LICENSE)).
The included code is released under the MIT license (see [details of the license](LICENSE)).

See [notices](NOTICE.md) for dependencies, credits, derived work and further details.
Loading

0 comments on commit 443474c

Please sign in to comment.