-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KGDataset.from_dataframe + custom data notebook #25
Conversation
a87ef12
to
e1b2801
Compare
I added to this PR with a few updates:
The updates to the notebook are from black formatting (can disregard). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - see comment regarding OpenBioLink notebook
|
||
This repository provides dataloaders for third party datasets. The use of these datasets is at own risk and Graphcore offers no warranties of any kind. It is the user's responsibility to comply with all license requirements for datasets downloaded with dataloaders in this repository. | ||
|
||
The tutorial notebooks make use of the following datasets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the OpenBioLink nb we should include that here
Thanks! I changed the dataset in the new notebook to biokg, please have a final look when you can :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - just spotted one typo
notebooks/0_custom_KG_dataset.ipynb
Outdated
"We download the directed, high-quality version of [OpenBioLink2020](https://github.com/openbiolink/openbiolink#benchmark-dataset) directly from the link provided by the authors. This shouldn't take more than a minute.\n", | ||
"\n", | ||
"Notice that OpenBioLink2020 integrates data from other sources, whose licensing terms are detailed in [this table](https://openbiolink.readthedocs.io/en/latest/sources.html) and should be minded when utilizing or redistributing the dataset files." | ||
"We download the OGBL-BioKG Knoweldge Graph using the `ogb` package (see [here](https://ogb.stanford.edu/docs/linkprop/#data-loader) for details on how to use it). This shouldn't take more than a couple of minutes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in Knowledge Graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed!
…otebook Tech Docs Review of "Using BESS-KGE with your Own Data" notebook
Added an ease-of-use method to build a
KGDataset
taking in directly labelled triples (organized in a pandas dataframe), performing the label -> ID conversion behind the curtain and re-ordering, if needed, to cluster entities of the same type. It is meant to be even higher-level than thefrom_triples
method.Using this, I also wrote up a notebook, as we were planning to do, to show how to build a custom dataset, using as example the OpenBioLink dataset (there are the usual concerns about licensing of data sources, but I hope that, if we use it just for this small demo notebook, it shouldn't be a problem?) Any feedback is welcome!