Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KGDataset.from_dataframe + custom data notebook #25

Merged
merged 17 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 28 additions & 15 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,36 @@
# How to contribute to the BESS-KGE project

You can contribute to the development of the BESS-KGE project, even if you don't have access to IPUs (you can use the [IPUModel](https://docs.graphcore.ai/projects/poptorch-user-guide/en/3.2.0/reference.html#poptorch.Options.useIpuModel) to emulate most functionalities of the physical hardware).
![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)

You can contribute to the development of the BESS-KGE project, even if you don't have access to IPUs (you can use the [IPUModel](https://docs.graphcore.ai/projects/poptorch-user-guide/en/3.2.0/reference.html#poptorch.Options.useIpuModel) to emulate most functionalities of the physical hardware).

## VS Code server on Paperspace

Setting up a VS Code server on [Paperspace](https://www.paperspace.com/graphcore) will allow you to tunnel into a machine with IPUs from the VS Code web editor or the desktop app. This requires minimum effort and is an excellent solution for developing and testing code directly on IPU hardware.
Setting up a VS Code server on [Paperspace](https://www.paperspace.com/graphcore) will allow you to tunnel into a machine with IPUs from the VS Code web editor or the desktop app. This requires minimum effort and is an excellent solution for developing and testing code directly on IPU hardware. Here's how to do it.

You can launch a 6-hours session on a Paperspace machine with access to 4 IPUs **for free** by clicking on this button: <a href="https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&amp;machine=Free-IPU-POD4"><img src="https://assets.paperspace.io/img/gradient-badge.svg" alt="Run on Gradient"></a>
1. Fork the [BESS-KGE repository](https://github.com/graphcore-research/bess-kge).

Start the machine (this will also clone the repo for you) and open up a terminal from the left pane.
2. You can launch a 6-hours session on a Paperspace machine with access to 4 IPUs **for free** by using a link of the form:
```
https://console.paperspace.com/github/{USERID}/{REPONAME}?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&amp;machine=Free-IPU-POD4
```

![terminal_pane](docs/source/images/Terminal1.png "height=200")
where `{USERID}/{REPOPNAME}` is the github address of the forked repository (e.g. `graphcore-research/bess-kge` for the original repo).

In the terminal, run the command
```shell
bash .gradient/launch_vscode_server.sh {tunnel-name}
```
3. Start the machine (this will also clone the repo for you) and open up a terminal from the left pane.

![terminal_pane](docs/source/images/Terminal3.png)

where `tunnel-name` is an optional argument that you can use to define the name of the remote tunnel (if not set, it will default to `ipu-paperspace`).
4. In the terminal, run the command
```shell
bash .gradient/launch_vscode_server.sh {tunnel-name}
```

The script will download and install all dependencies and start the tunnel. You will be asked to authorize the tunnel through GitHub, before being provided with the tunnel link. Please refer to [this notebook](https://ipu.dev/fmo4AZ) for additional details on these steps and to connect the VS Code desktop app to the remote tunnel.
where `tunnel-name` is an optional argument that you can use to define the name of the remote tunnel (if not set, it will default to `ipu-paperspace`). The script will download and install all dependencies and start the tunnel.

Once VS Code is connected to the Paperspace machine, run `./dev build` to build all custom ops. You are now ready to create a new git branch and start developing!
5. When asked, authorize the tunnel through GitHub (with an account having writing privileges to the forked repository). You will be then provided with the tunnel link. Please refer to [this notebook](https://ipu.dev/fmo4AZ) for additional details on these steps and to connect the VS Code desktop app to the remote tunnel.

6. Once VS Code is connected to the Paperspace machine, run `./dev build` to build all custom ops. You are now ready to start developing!

When closing a session and stopping the Paperspace machine, remember to unregister the tunnel in VS Code as explained in the "Common Issues" paragraph of the [notebook](https://ipu.dev/fmo4AZ). To resume your work, just access the clone of the BESS-KGE repo in the "Projects" section of your Paperspace profile, start a new machine and repeat the operations above. All code changes to the local repo, as well as VS Code settings and extensions installed, will persist across sessions.

Expand All @@ -43,10 +52,14 @@ pip install $POPLAR_SDK_ENABLED/../poptorch-*.whl
pip install -r requirements-dev.txt
```

Finally, build all custom ops by running `./dev build`
Finally, clone your fork of the BESS-KGE repository and build all custom ops by running `./dev build`

## Development tips

The `./dev` command can be used to run several utility scripts during development. Check `./dev --help` for a list of dev options.

## Tips
Before submitting a PR to the upstream repo, use `./dev ci` to run all CI checks locally. In particular, be mindful of our formatting requirements: you can check for formatting errors by running `./dev format` and `./dev lint` (both commands are automatically run inside `./dev ci`).

Run `./dev --help` for a list of dev options. In particular, use `./dev ci` to run all CI checks locally. Run individual tests with pattern matching filtering `./dev tests -k FILTER`.
Add unit tests to the `tests` folder. You can run individual unit tests with pattern matching filtering `./dev tests -k FILTER`.

Add `.cpp` custom ops to `besskge/custom_ops`. Also, update the [Makefile](Makefile) when adding custom ops.
19 changes: 15 additions & 4 deletions NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@ Copyright (c) 2023 Graphcore Ltd. Licensed under the MIT License.

The included code is released under an MIT license, (see [LICENSE](LICENSE)).

The ogbl-biokg and ogbl-wikikg2 datasets are licensed under CC-0.

The [YAGO3 dataset](https://yago-knowledge.org/downloads/yago-3) by the [YAGO team](https://yago-knowledge.org/contributors) of the [Max-Planck Institute for Informatics](https://www.mpi-inf.mpg.de/home/) and [Telcom Paris](https://www.telecom-paris.fr/) is licensed under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).
## Dependencies

Our dependencies are (see [requirements.txt](requirements.txt)):

Expand All @@ -18,8 +16,21 @@ Our dependencies are (see [requirements.txt](requirements.txt)):

We also use additional Python dependencies for development/testing/documentation (see [requirements-dev.txt](requirements-dev.txt)).

## Dataset disclaimer

This repository provides dataloaders for third party datasets. The use of these datasets is at own risk and Graphcore offers no warranties of any kind. It is the user's responsibility to comply with all license requirements for datasets downloaded with dataloaders in this repository.

The tutorial notebooks make use of the following datasets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the OpenBioLink nb we should include that here


* [ogbl-biokg](https://ogb.stanford.edu/docs/linkprop/#ogbl-biokg), licensed under CC-0;

* [ogbl-wikikg2](https://ogb.stanford.edu/docs/linkprop/#ogbl-wikikg2), licensed under CC-0;

* [YAGO3 dataset](https://yago-knowledge.org/downloads/yago-3) by the [YAGO team](https://yago-knowledge.org/contributors) of the [Max-Planck Institute for Informatics](https://www.mpi-inf.mpg.de/home/) and [Telcom Paris](https://www.telecom-paris.fr/), licensed under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).

## Derived work

**This directory includes derived work from the following:**
This directory includes derived work from the following:

---

Expand Down
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# BESS-KGE
![Continuous integration](https://github.com/graphcore-research/bess-kge/actions/workflows/ci.yaml/badge.svg)
![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)

[**Installation guide**](#usage)
| [**Tutorials**](#paperspace-notebook-tutorials)
Expand Down Expand Up @@ -77,6 +78,17 @@ Additional variations of the distribution scheme are detailed in the [BESS-KGE d

All APIs are documented in the [BESS-KGE API documentation](https://graphcore-research.github.io/bess-kge/API_reference.html).

### Datasets

BESS-KGE provides built-in dataloaders for the following datasets. Notice that the use of these datasets is at own risk and Graphcore offers no warranties of any kind. It is the user's responsibility to comply with all license requirements for datasets downloaded with dataloaders in this repository.

| Dataset | Builder method | Entities | Entity types | Relation types | Triples | License |
| --- | --- | --- | --- | --- | --- | --- |
| [ogbl-biokg](https://ogb.stanford.edu/docs/linkprop/#ogbl-biokg) | [KGDataset.build_ogbl_biokg](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_ogbl_biokg) | 93,773 | 5 | 51 | 5,088,434 | CC-0 |
| [ogbl-wikikg2](https://ogb.stanford.edu/docs/linkprop/#ogbl-wikikg2) | [KGDataset.build_ogbl_wikikg2](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_ogbl_wikikg2) | 2,500,604 | 1 | 535 | 16,968,094 | CC-0 |
| [YAGO3-10](https://yago-knowledge.org/downloads/yago-3) | [KGDataset.build_yago310](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_yago310) | 123,182 | 1 | 37 | 1,089,040 | CC BY 3.0 |
| [OpenBioLink2020](https://github.com/openbiolink/openbiolink#benchmark-dataset) | [KGDataset.build_openbiolink](https://graphcore-research.github.io/bess-kge/generated/besskge.dataset.KGDataset.html#besskge.dataset.KGDataset.build_openbiolink) | 184,635 | 7 | 28 | 4,563,405 | [link](https://github.com/openbiolink/openbiolink#Source-databases-and-their-licenses) |

### Known limitations

* BESS-KGE supports distribution for up to 16 IPUs.
Expand Down Expand Up @@ -178,10 +190,11 @@ For a walkthrough of the `besskge` library functionalities, see our Jupyter note
2. [Link prediction on the YAGO3-10 dataset](notebooks/2_yago_topk_prediction.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F2_yago_topk_prediction.ipynb)
3. [FP16 weights and compute on the OGBL-WikiKG2 dataset](notebooks/3_wikikg2_fp16.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F3_wikikg2_fp16.ipynb)

For pointers on how to run BESS-KGE on a custom Knowledge Graph dataset, see the notebook [Using BESS-KGE with your own data](notebooks/0_custom_KG_dataset.ipynb) [![Run on Gradient](docs/gradient-badge.svg)](https://console.paperspace.com/github/graphcore-research/bess-kge?container=graphcore%2Fpytorch-paperspace%3A3.3.0-ubuntu-20.04-20230703&machine=Free-IPU-POD4&file=%2Fnotebooks%2F0_custom_KG_dataset.ipynb)

## Contributing

You can contribute to the BESS-KGE project. See [How to contribute to the BESS-KGE project](CONTRIBUTING.md)
You can contribute to the BESS-KGE project: PRs are welcome! For details, see [How to contribute to the BESS-KGE project](CONTRIBUTING.md).

## References
BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion ([arXiv](https://arxiv.org/abs/2211.12281))
Expand All @@ -190,6 +203,6 @@ BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Compl

Copyright (c) 2023 Graphcore Ltd. Licensed under the MIT License.

The included code is released under the MIT license, (see [details of the license](LICENSE)).
The included code is released under the MIT license (see [details of the license](LICENSE)).

See [notices](NOTICE.md) for dependencies, credits, derived work and further details.
Loading