Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize handling and reporting of input errors #128

Merged
merged 27 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
8c85a08
updates to error messages
lnrekerle Jan 9, 2024
8803390
Merge branch 'develop' into phenopacket_creator_errors
ielis Jan 9, 2024
9e7e15a
Add auditors.
ielis Jan 9, 2024
b696a33
Use `SampleLabels` to represent sample ID instead of a simple `str`.
ielis Jan 9, 2024
36e754b
Spread `Auditor` API across the members of the preprocessing package.
ielis Jan 9, 2024
12c9275
Tweak VEP wrapper.
ielis Jan 9, 2024
fbabb5a
Add timeout to VEP wrapper.
ielis Jan 9, 2024
a972f67
Small tweaks.
ielis Jan 9, 2024
a6d1e82
Fix the user guide for the time being.
ielis Jan 9, 2024
8418e81
Fix the phenopacket loader function.
ielis Jan 9, 2024
4cbf1d7
Clean up the test setup before working on `PhenotypeCreator` tests.
ielis Jan 11, 2024
7788d92
Implement and test the input checks in the `PhenotypeCreator`.
ielis Jan 11, 2024
371f7cb
The `PhenotypeValidationException` is replaced by the `Auditor` API.
ielis Jan 11, 2024
512aea6
Emit a warning for an obsolete term ID.
ielis Jan 11, 2024
e58ea61
Warn about no phenotype features, prepare for collating the issues an…
ielis Jan 11, 2024
516c801
Improve `PatientCreator` docs.
ielis Jan 11, 2024
7393965
Clean imports.
ielis Jan 12, 2024
b257131
Create `CohortCreator`.
ielis Jan 12, 2024
8cea1f0
Add tree notepad.
ielis Jan 16, 2024
0d0d1a6
Editing error handling
lnrekerle Jan 16, 2024
e7358cd
Update variant handling code.
ielis Jan 16, 2024
0315108
Fix signature.
ielis Jan 16, 2024
d91f47c
Merge branch 'phenopacket_creator_errors' into handle-phenotype-creat…
ielis Jan 16, 2024
6e558cd
Finalize the loading.
ielis Jan 16, 2024
b9e29ed
Update pydoc.
ielis Jan 16, 2024
b00414e
Fix the tests.
ielis Jan 16, 2024
fdaab4e
Update the user guide.
ielis Jan 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 15 additions & 36 deletions docs/user-guide/input-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ Create a cohort from GA4GH phenopackets

The easiest way to input data into `genophenocorr` is to use the
`GA4GH Phenopacket Schema <https://phenopacket-schema.readthedocs.io/en/latest>`_ phenopackets.
`genophenocorr` provides :class:`genophenocorr.preprocessing.PhenopacketPatientCreator`,
an out-of-the-box class for loading phenopackets.
`genophenocorr` provides an out-of-the-box solution for loading a cohort from a folder of phenopacket JSON files.


Let's start with loading Human Phenotype Ontology, a requisite for the input Q/C steps. We'll use the amazing
Expand All @@ -36,21 +35,21 @@ the standard `genophenocorr` installation:

>>> hpo = hpotk.load_minimal_ontology('http://purl.obolibrary.org/obo/hp.json')

Next, let's create the `PhenopacketPatientCreator`. We use a convenience method
:func:`genophenocorr.preprocessing.configure_caching_patient_creator`:
Next, let's get a `CohortCreator` for loading the phenopackets. We use the
:func:`genophenocorr.preprocessing.configure_caching_cohort_creator` convenience method:

.. doctest:: input-data

>>> from genophenocorr.preprocessing import configure_caching_patient_creator
>>> from genophenocorr.preprocessing import configure_caching_cohort_creator

>>> patient_creator = configure_caching_patient_creator(hpo)
>>> cohort_creator = configure_caching_cohort_creator(hpo)

.. note::

By default, the method creates the patient creator that will call Variant Effect Predictor
and Uniprot APIs to perform the functional annotation and protein annotation and cache the responses
The default `cohort_creator` will call Variant Effect Predictor
and Uniprot APIs to perform the functional annotation and protein annotation, and the responses will be cached
in the current working directory to save the bandwidth.
See the :func:`genophenocorr.preprocessing.configure_caching_patient_creator` for more configuration options.
See the :func:`genophenocorr.preprocessing.configure_caching_cohort_creator` for more configuration options.

We can create a cohort starting from a folder with phenopackets stored as JSON files.
For the purpose of this example, we will use a folder `simple_cohort` with 5 example phenopackets located in
Expand All @@ -61,38 +60,18 @@ For the purpose of this example, we will use a folder `simple_cohort` with 5 exa
>>> import os
>>> simple_cohort_path = os.path.join(os.getcwd(), 'data', 'simple_cohort')

Here we walk the file system, load all phenopacket JSON files, and transform the phenopackets into instances of
:class:`genophenocorr.model.Patient`:
We load the phenopackets using `cohort_creator` defined above together with another convenience function
:class:`genophenocorr.preprocessing.load_phenopacket_folder`:

.. doctest:: input-data

>>> import os
>>> from phenopackets import Phenopacket
>>> from google.protobuf.json_format import Parse

>>> patients = []
>>> for dirpath, _, filenames in os.walk(simple_cohort_path):
... for filename in filenames:
... if filename.endswith('.json'):
... pp_path = os.path.join(dirpath, filename)
... with open(pp_path) as fh:
... pp = Parse(fh.read(), Phenopacket())
... patient = patient_creator.create_patient(pp)
... patients.append(patient)


>>> f'Loaded {len(patients)} phenopackets'
'Loaded 5 phenopackets'

Now we can construct a `Cohort`:

.. doctest:: input-data
>>> from genophenocorr.preprocessing import load_phenopacket_folder

>>> from genophenocorr.model import Cohort
>>> cohort = load_phenopacket_folder(simple_cohort_path, cohort_creator)
>>> len(cohort)
5

>>> cohort = Cohort.from_patients(patients)
>>> f'Created a cohort with {len(cohort)} members'
'Created a cohort with 5 members'
We loaded phenopackets into a `Cohort` consisting of 5 members.


Create a cohort from other data
Expand Down
439 changes: 143 additions & 296 deletions notebooks/KBG/KBG.ipynb

Large diffs are not rendered by default.

112 changes: 42 additions & 70 deletions notebooks/MAPK8IP3/MAPK8IP3.ipynb

Large diffs are not rendered by default.

152 changes: 76 additions & 76 deletions notebooks/PPP2R1A/PPP2R1A.ipynb

Large diffs are not rendered by default.

1,020 changes: 501 additions & 519 deletions notebooks/RPGRIP1/RPGRIP1.ipynb

Large diffs are not rendered by default.

Loading