Skip to content

Commit

Permalink
Merge pull request #440 from monarch-initiative/add-externally-manage…
Browse files Browse the repository at this point in the history
…d-content

Add pipeline for externally managed content
  • Loading branch information
matentzn authored Apr 7, 2024
2 parents 3c353fb + 1122c7d commit 2f5981d
Show file tree
Hide file tree
Showing 8 changed files with 164,506 additions and 0 deletions.
33 changes: 33 additions & 0 deletions docs/external/nord.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# NORD - National Organization for Rare Disorders

**Source name:** NORD - Externally managed content

**Source description:**

> NORD advances practical, meaningful, and enduring change so people with rare diseases can live their fullest and best lives. Every day, we elevate care, advance research, and drive policy in a purposeful and holistic manner to lift up the rare disease community. (https://rarediseases.org/, 22.02.2024)
NORD provides three datatypes to us:

* Cross-references to NORD content
* Subset-declarations, which basically correspond to "what NORD considers a Rare Disease"
* Preferred Names for certain diseases

The content is provided by an API endpoint at NORD:

```
https://rdbdev.wpengine.com/wp-content/uploads/mondo-export/rare_export.tsv
```

For additional information, see [mondo-ingest.Makefile](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile), in particular the goal called `$(TMPDIR)/nord.tsv:`.


**Homepage:** https://rarediseases.org/

**Comments about this source:**

The pipeline works like this:

1. The TSV is downloaded from the NORD endpoint
2. A script injects a ROBOT template header into the TSV and then compiles it into OWL.
3. On the Mondo repo side, we have a script that deletes old NORD content, and merges in the updated version.
4. As of 22.02.2024, the NORD TSV file needs to be manually updated on the NORD side. This does not affect us, but could explain why certain things are not updated in sync with their website. NORD knows about this and tries to find a solution.

This comment has been minimized.

Copy link
@joeflack4

joeflack4 Apr 8, 2024

Contributor

I prefer YYYY-MM-DD!

This comment has been minimized.

Copy link
@matentzn

matentzn Apr 9, 2024

Author Member

For the docs? If it matters just PR

This comment has been minimized.

Copy link
@joeflack4

joeflack4 Apr 9, 2024

Contributor

Always!

19 changes: 19 additions & 0 deletions docs/externally-managed-content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## Externally managed content

This comment has been minimized.

Copy link
@joeflack4

joeflack4 Apr 8, 2024

Contributor

Thanks for making these docs


Externally managed content is content that is provided by trusted providers and is merged in _unreviewed_. Currently, we support three types of externally managed content:

1. Preferred names / labels. If a partner organisation of Monarch has a certain preference for a name this can be recorded as part of the metadata.
2. Cross-references and linkouts. Partner organisations can provide cross-references and linkout to important resources related to a disease.

This comment has been minimized.

Copy link
@joeflack4

joeflack4 Apr 8, 2024

Contributor

@matentzn What is a linkout? That is, what's the difference between it and an xref?

I looked through my notes and I only have seen this used one other time (by you):

Since all terms in MedGen seem to have an associated Medgen UID, my thinking was that we should simply provide linkouts to all of these, say, in this case, MONDO:0014753 to MEDGEN:833442

Does it mean, perhaps, an inferred xref?

This comment has been minimized.

Copy link
@matentzn

matentzn Apr 9, 2024

Author Member
3. Subsets. Partner organisations can provide subset information to Mondo. This is used in a variety of ways, such as:
- NORD declares which diseases it consideres "rare"
- Open Targets declares which diseases are used for their drug-prediction framework

### Typical workflows

1. External provider provides a TSV. (Ideally they use the same template that NORD uses - see `src/ontology/external/nord.robot.tsv`).

This comment has been minimized.

Copy link
@joeflack4

joeflack4 Apr 8, 2024

Contributor

If we get a few more such external management cases, guess we could spec this TSV format.

2. We pull it in and turn it into a ROBOT template and transform it to owl.

### Related issues and PRs:

- [Issue: Represent Externally Managed Content in the Mondo Ingest](https://github.com/monarch-initiative/mondo-ingest/issues/439)
- [PR: Add pipeline for externally managed content](https://github.com/monarch-initiative/mondo-ingest/pull/440)
3 changes: 3 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ nav:
- NCIT: sources/ncit.md
- OMIM: sources/omim.md
- ORDO: sources/ordo.md
- Externally managed content:
- Overview: externally-managed-content.md
- NORD: external/nord.md
- Mondo source metrics:
- Overview: metrics.md
- DO: metrics/doid.md
Expand Down
11 changes: 11 additions & 0 deletions src/ontology/config/external-content-robot-headers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"mondo_id": "ID",
"report_ref": "A oboInOwl:hasDbXref",
"report_ref_source": ">A oboInOwl:source",
"preferred_name": "A oboInOwl:hasExactSynonym",
"preferred_name_source": ">A oboInOwl:hasDbXref",
"synonym_type": ">A oboInOwl:hasSynonymType",
"subset": "AI oboInOwl:inSubset",
"subset_source": ">A oboInOwl:source",
"subset_source2": ">A oboInOwl:source"
}
Loading

0 comments on commit 2f5981d

Please sign in to comment.