Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
payalchandak authored Jul 28, 2022
1 parent a0b9806 commit b29722a
Showing 1 changed file with 4 additions and 16 deletions.
20 changes: 4 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,11 @@ pykeen.datasets.has_dataset('primekg')

## Building an updated PrimeKG

#### Downloading primary data resources
### Downloading primary data resources

All persistent identifiers and weblinks to download the 20 primary data resources used to build PrimeKG are systematically provided in the Data Records section of our article. We have also mentioned the exact filenames that were downloaded from each resource for easy corroboration.

#### Curating primary data resources
### Curating primary data resources

We provide the scripts used to process all primary data resources and the names of the resulting output files generated by those scripts. We would be happy to share the intermediate processing datasets that were used to create PrimeKG on request.

Expand All @@ -119,26 +119,14 @@ UBERON | uberon.py | uberon_terms.csv, uberon_rels.csv, uberon_is_a.csv
UMLS | umls.py, map_umls_mondo.py | umls_mondo.csv
UMLS | umls.ipynb | umls_def_disorder_2021.csv, umls_def_disease_2021.csv

#### Harmonizing datasets into PrimeKG
### Harmonizing datasets into PrimeKG

The code to harmonize datasets and construct PrimeKG is available at `build_graph.ipynb`. Simply run this jupyter notebook in order to construct the knowledge graph form the outputs of the processing files mentioned above. This jupyter notebook produces all three versions of PrimeKG, `kg_raw.csv`, `kg_giant.csv`, and the complete version `kg.csv`.

#### Feature extraction
### Feature extraction

The code required to engineer features can be found at `engineer_features.ipynb` and `mapping_mayo.ipynb`.

<!--
#### Dataset Splits
To retrieve the training/validation/test dataset split, you could simply type
```python
data = X(name = Y)
data.get_split(seed = 42)
# {'train': df_train, 'val': df_val, 'test': df_test}
```
You can specify the splitting method, random seed, and split fractions in the function by e.g. `data.get_split(method = 'scaffold', seed = 1, frac = [0.7, 0.1, 0.2])`. Check out the [data split page](https://zitniklab.hms.harvard.edu/TDC/functions/data_split/) on the website for details.
-->

## Cite Us

If you find PrimeKG useful, cite our work:
Expand Down

0 comments on commit b29722a

Please sign in to comment.