Skip to content

Commit

Permalink
[ENH] Update Nipoppy docs and add page about trackers (#164)
Browse files Browse the repository at this point in the history
* update paths to scripts

* add note to clarify terms "session" and "session ID"

* attempt to fix nested list

* add page about trackers

* remove old tracking section from MRIQC page

* minor changes

* try to fix nested list rendering (again)

* add notes about `run_dicom_org.py` optional parameters

* fix/udpate MRIQC page sample command

* make link to digest a clickable link

* address Nikhil comments

* add links to manifest and doughnut descriptions

* add glossary

* reorder glossary and update "`session_id` vs `visit_id`"

* add recommendation that visits should be a timeline

* fix French spelling...

* add updates after speaking with Nikhil

* add mention of "subject ID" in `participant_id` entry

---------

Co-authored-by: Nikhil Bhagwat <[email protected]>
  • Loading branch information
michellewang and nikhil153 authored Feb 9, 2024
1 parent 6a46e21 commit dfffd7b
Show file tree
Hide file tree
Showing 16 changed files with 171 additions and 74 deletions.
Binary file modified docs/imgs/code_org.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/data_org.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/imgs/data_org.png
Binary file not shown.
Binary file added docs/imgs/digest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/nipoppy/code_org.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The Nipoppy codebase is divided into data processing `workflows` and data availa

**`workflow`**

- MRI data organization (`dicom_org` and `bids_conv`)
- MRI data organization ([`dicom_org`](./workflow/dicom_org.md) and [`bids_conv`](./workflow/bids_conv.md))
- Custom script to organize raw DICOMs (i.e. scanner output) into a flat participant-level directory.
- Convert DICOMs into BIDS using [Heudiconv](https://heudiconv.readthedocs.io/en/latest/)
- MRI data processing (`proc_pipe`)
Expand Down
10 changes: 8 additions & 2 deletions docs/nipoppy/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ Nipoppy requires two global files for specifying local data/container paths and
- Information about tabular data (`TABULAR`)
- Version and path to the data dictionary (`data_dictionary`)

!!! Note

Nipoppy uses the term "session" to refer to a session ID string with the "ses-" prefix. For example, `ses-01` is a session, and `01` is the session ID associated with this session.

!!! Suggestion

Although not mandatory, for consistency the preferred location would be: `<DATASET_ROOT>/proc/global_configs.json`.
Expand Down Expand Up @@ -73,9 +77,11 @@ Nipoppy requires two global files for specifying local data/container paths and

### Participant manifest: `manifest.csv`
- This list serves as the **ground truth** for subject and visit (i.e. session) availability
- Create the `manifest.csv` in `<DATASET_ROOT>/tabular/` comprising following columns
- Create the `manifest.csv` in `<DATASET_ROOT>/tabular/` comprising following columns:
- `participant_id`: ID assigned during recruitment (at times used interchangeably with `subject_id`)
- `visit`: label to denote participant visit for data acquisition (e.g. `"baseline"`, `"m12"`, `"m24"` or `"V01"`, `"V02"` etc.)
- `visit`: label to denote participant visit for data acquisition
- ***Note***: we recommend that visits describe a timeline if possible, for example `BL`, `M12`, `M24` (for Baseline, Month 12, and Month 24 respectively).
- Alternatively, visits should be ordinal and ideally named with the `V` prefix (e.g., `V01`, `V02`)
- `session`: alternative naming for visit - typically used for imaging data to comply with [BIDS standard](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html)
- `datatype`: a list of acquired imaging datatype as defined by [BIDS standard](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html)
- New participant are appended upon recruitment as new rows
Expand Down
2 changes: 1 addition & 1 deletion docs/nipoppy/data_org.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Directories:
- `backups`: data backup space (tars)
- `releases`: data releases (symlinks)

![data_org](../imgs/data_org.png)
![data_org](../imgs/data_org.jpg)
52 changes: 52 additions & 0 deletions docs/nipoppy/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Glossary

This page lists some definitions for important/recurring terms used in the Nipoppy framework.

### `participant_id`

**Appears in**: `manifest.csv`, `doughnut.csv`

: Unique identifier for the participant (i.e., subject ID), as provided by the study.

### `datatype`

**Appears in**: `manifest.csv`

: A BIDS-compliant "data type" value (see the [BIDS specification website](https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions) for a comprehensive list). The most common data types for magnetic resonance imaging (MRI) data are `"anat"`, `"func"`, and `"dwi"`.

### `visit`

**Appears in**: `manifest.csv`

: An identifier for a data collection event, not restricted to imaging data.

See also: [`session` vs `visit`](#session-vs-visit)

### `session`

**Appears in**: `manifest.csv`, `doughnut.csv`

: A BIDS-compliant session identifier. Consists of the `"ses-"` prefix followed by the [`session_id`](#session_id).

#### [`session`](#session) vs [`visit`](#visit)

Nipoppy uses `session` for imaging data, following the convention established by BIDS. The term `visit`, on the other hand, is used to refer to any data collection event (not necessarily imaging-related). In most cases, `session` and `visit` will be identical (or `session`s will be a subset of `visit`s). However, having two descriptors becomes particularly useful when imaging and non-imaging assessments do not use the same naming conventions.

### `participant_dicom_dir`

**Appears in**: `doughnut.csv`

: The name of the directory in which the raw DICOM data (before the DICOM organization step) are found. Usually, this is the same as [`participant_id`](#participant_id), but depending on the study it could be different.

### `dicom_id`

**Appears in**: `doughnut.csv`

: The [`participant_id`](#participant_id), stripped of any non-alphanumerical character. For studies that do not use non-alphanumerical characters in their participant IDs, this is exactly the same as [`participant_id`](#participant_id).

### `bids_id`

**Appears in**: `doughnut.csv`

: A BIDS-compliant participant identifier. Obtained by adding the `"sub-"` prefix to the [`dicom_id`](#dicom_id), which itself is derived from the [`participant_id`](#participant_id). A participant's raw BIDS data and derived imaging data are stored in directories named after their `bids_id`.

17 changes: 9 additions & 8 deletions docs/nipoppy/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,21 @@ The Nipoppy workflow comprises a Nipoppy codebase that operates on a Nipoppy dat
### Nipoppy code+env installation
1. Change directory to where you want to clone this repo, e.g.: `cd /home/<user>/projects/<my_project>/code/`
2. Create a new [venv](https://realpython.com/python-virtual-environments-a-primer/): `python3 -m venv nipoppy_env`
* Alternatively (if using [Anaconda/Miniconda](https://www.anaconda.com/)), create a `conda` environment: `conda create --name nipoppy_env python=3.9`
3. Activate your env: `source nipoppy_env/bin/activate`
* If using Anaconda/Miniconda: `conda activate nipoppy_env`
* Alternatively (if using [Anaconda/Miniconda](https://www.anaconda.com/)), create a `conda` environment: `conda create --name nipoppy_env python=3.9`
3. Activate your env: `source nipoppy_env/bin/activate`
* If using Anaconda/Miniconda: `conda activate nipoppy_env`
4. Clone this repo: `git clone https://github.com/neurodatascience/nipoppy.git`
5. Change directory to `nipoppy`
6. Install python dependencies: `pip install -e .`
5. Change directory to `nipoppy`
6. Install python dependencies: `pip install -e .`

### Nipoppy dataset directory setup

Run `tree.py` to create the Nipoppy dataset directory tree:
Run [`nipoppy/tree.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/tree.py) to create the Nipoppy dataset directory tree:
```bash
python tree.py --nipoppy_root <DATASET_ROOT>
python nipoppy/tree.py --nipoppy_root <DATASET_ROOT>
```
Where
Where:

- `DATASET_ROOT`: root (starting point) of the Nipoppy structured dataset

!!! Suggestion
Expand Down
4 changes: 1 addition & 3 deletions docs/nipoppy/overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
## What is Nipoppy (formerly mr_proc)?

[*Process long and prosper*](https://en.wikipedia.org/wiki/Vulcan_salute)
## What is Nipoppy?

[Nipoppy](https://github.com/neurodatascience/nipoppy) is a lightweight framework for analyzing (neuro)imaging and clinical data. It is designed to help users do the following:

Expand Down
66 changes: 66 additions & 0 deletions docs/nipoppy/trackers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Track data availability status

---

Trackers check the availability of files created during the dataset processing workflow (specifically the BIDS raw data and imaging pipeline derivatives) and assign an availability status (`SUCCESS`, `FAIL`, `INCOMPLETE` or `UNAVAILABLE`).

---

### Key directories and files

- `<DATASET_ROOT>/bids`
- `<DATASET_ROOT>/derivatives`
- `<DATASET_ROOT>/derivatives/bagel.csv`

### Running the tracker script

The tracker uses the [`manifest.csv`](./configs.md#participant-manifest-manifestcsv) and [`doughnut.csv`](./workflow/dicom_org.md#procedure) files to determine the participant-session pairs to check. Each available tracker has an associated configuration file (typically called `<pipeline>_tracker.py`), where lists of expected paths for files produced by the pipeline are defined.

For each participant-session pair being tracked, the tracker outputs a `"pipeline_complete"` status. Depending on the configuration for that particular pipeline, the tracker might also output phase and/or stage statuses (e.g., `"PHASE__func"`), which typically refer to sub-pipelines within the full pipeline that may or may not have been run during processing, depending on the input data and/or processing parameters.

The tracker script updates the tabular `<DATASET_ROOT>/derivatives/bagel.csv` file (see the [Understanding the `bagel.csv` output](#understanding-the-bagelcsv-output) for more information).

> Sample command:
```bash
python nipoppy/trackers/run_tracker.py \
--global_config <global_config_file>
--dash_schema nipoppy/trackers/bagel_schema.json
--pipelines fmriprep mriqc tractoflow heudiconv
```

Notes:
- Currently available image processing pipelines are: `fmriprep`, `mriqc`, and `tractoflow`. See [Adding a tracker](#adding-a-tracker) for the steps to add a new tracker.
- Use `--pipelines heudiconv` for tracking BIDS data availability
- An optional `--session_id` parameter can be specified to only track a specific session. By default, the trackers are run for all sessions.
- Other optional arguments include `--run_id` and `--acq_label`, to help generate expected file paths for BIDS Apps.

### Understanding the `bagel.csv` output

A JSON schema for the `bagel.csv` file produced by the tracker script is available [here](https://github.com/neurobagel/digest/blob/main/schemas/bagel_schema.json).

Here is an example of a `bagel.csv` file:

| bids_id | participant_id | session | has_mri_data | pipeline_name | pipeline_version | pipeline_starttime | pipeline_complete |
| ------- | -------------- | ------- | ------------ | ------------- | ---------------- | ------------------ | ----------------- |
| sub-MNI001 | MNI001 | 1 | TRUE | freesurfer | 6.0.1 | 2022-05-24 13:43 | SUCCESS |
| sub-MNI001 | MNI001 | 2 | TRUE | freesurfer | 6.0.1 | 2022-05-24 13:46 | SUCCESS |
| sub-MNI001 | MNI001 | 3 | TRUE | freesurfer | 6.0.1 | UNAVAILABLE | INCOMPLETE |

The imaging derivatives bagel has one row for each participant-session-pipeline combination. The pipeline status columns are `"pipeline_complete"`, and any column whose name begins by `"PHASE__"` or `"STAGE__"`. The possible values for these columns are:
- `"SUCCESS"`: All expected pipeline output files (as configured by pipeline tracker) are present.
- `"FAIL"`: At least one expected pipeline output is missing.
- `"INCOMPLETE"`: Pipeline has not been run for the subject session (output directory missing).
- `"UNAVAILABLE"`: Relevant MRI modality for pipeline not available for subject session (determined by the `datatype` column in the dataset's manifest file).

### Adding a tracker

1. Create a new file in `nipoppy/trackers` called `<new_pipeline>_tracker.py`.
2. Define a config dictionary `tracker_configs`, with a mandatory key `"pipeline_complete"` whose value is a function that takes as input the path to the subject result directory, as well as the session and run IDs, and outputs one of `"SUCCESS"`, `"FAIL"`, `"INCOMPLETE"`, or `"UNAVAILABLE"`. See the built-in [fMRIPrep tracker](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/trackers/fmriprep_tracker.py) for an example.
3. Optionally add additional stages and phases to track. Again, refer to the [fMRIPrep tracker](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/trackers/fmriprep_tracker.py) for to any other pre-defined tracker configuration for an example.
4. Modify `nipoppy/trackers/run_tracker.py` to add the new tracker as an option.

### Visualizing availability status with the Neurobagel [`digest`](https://digest.neurobagel.org/)

The `bagel.csv` file written by the tracker can be uploaded to [https://digest.neurobagel.org/](https://digest.neurobagel.org/) (as an "imaging CSV file") for interactive visualizations of processing status.

![digest](../imgs/digest.png)
10 changes: 5 additions & 5 deletions docs/nipoppy/workflow/bids_conv.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,17 @@ Convert DICOMs to BIDS using [Heudiconv](https://heudiconv.readthedocs.io/en/lat
### Procedure

1. Ensure you have the appropriate HeuDiConv container listed in your `global_configs.json`
2. Use [run_bids_conv.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/bids_conv/run_bids_conv.py) to run HeuDiConv `stage_1` and `stage_2`.
2. Use [nipoppy/workflow/bids_conv/run_bids_conv.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/bids_conv/run_bids_conv.py) to run HeuDiConv `stage_1` and `stage_2`.
- Run `stage_1` to generate a list of available protocols from the DICOM header. These protocols are listed in `<DATASET_ROOT>/bids/.heudiconv/<participant_id>/info/dicominfo_ses-<session_id>.tsv`

> Sample cmd:
```bash
python run_bids_conv.py \
python nipoppy/workflow/bids_conv/run_bids_conv.py \
--global_config <global_config_file> \
--session_id <session_id> \
--stage 1
```

!!! note

If participants have multiple sessions (or visits), these need to be converted separately and combined post-hoc to avoid Heudiconv errors.
Expand All @@ -43,7 +43,7 @@ python run_bids_conv.py \

> Sample cmd:
```bash
python run_bids_conv.py \
python nipoppy/workflow/bids_conv/run_bids_conv.py \
--global_config <global_config_file> \
--session_id <session_id> \
--stage 2
Expand All @@ -52,4 +52,4 @@ python run_bids_conv.py \

!!! note

Once `heuristic.py` is finalized, only `stage_2` needs to be run peridodically unless new scan protocol is added.
Once `heuristic.py` is finalized, only `stage_2` needs to be run periodically unless new scan protocol is added.
14 changes: 7 additions & 7 deletions docs/nipoppy/workflow/dicom_org.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

---

This is a dataset specific process and needs to be customized based on local scanner DICOM dumps and file naming. This organization should produce, for a given session, participant specific dicom dirs. Each of these participant-dir contains a flat list of dicoms for the participant for all available imaging modalities and scan protocols. The manifest is used to determine which new subject-session pairs need to be processed, and a `doughnut.csv` file is used to track the status for the DICOM reorganization and BIDS conversion steps.
This is a dataset-specific process and needs to be customized based on local scanner DICOM dumps and file naming. This organization should produce, for a given session, participant specific dicom dirs. Each of these participant-dir contains a flat list of dicoms for the participant for all available imaging modalities and scan protocols. The manifest is used to determine which new subject-session pairs need to be processed, and a `doughnut.csv` file is used to track the status for the DICOM reorganization and BIDS conversion steps.

---
### Key directories and files
Expand All @@ -15,7 +15,7 @@ This is a dataset specific process and needs to be customized based on local sca

### Procedure

1. Run [`workflow/make_doughnut.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/make_doughnut.py) to update `doughnut.csv` based on the manifest. It will add new rows for any subject-session pair not already in the file.
1. Run [`nipoppy/workflow/make_doughnut.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/make_doughnut.py) to update `doughnut.csv` based on the manifest. It will add new rows for any subject-session pair not already in the file.
- To create the `doughnut.csv` for the first time, use the `--empty` argument. If processing has been done without updating `doughnut.csv`, use `--regenerate` to update it based on new files in the dataset.

!!! note
Expand All @@ -42,18 +42,18 @@ This is a dataset specific process and needs to be customized based on local sca

!!! note

It is **okay** for the participant directory to have messy internal subdir tree with DICOMs from multiple modalities. (See [data org schematic](data_org.md) for details). The run script will search and validate all available DICOM files automatically.
It is **okay** for the participant directory to have messy internal subdir tree with DICOMs from multiple modalities. (See [data org schematic](../../imgs/data_org.jpg) for details). The run script will search and validate all available DICOM files automatically.


4. Run [`run_dicom_org.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/dicom_org/run_dicom_org.py) to:
4. Run [`nipoppy/workflow/dicom_org/run_dicom_org.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/dicom_org/run_dicom_org.py) to:
- Search: Find all the DICOMs inside the participant directory.
- Validate: Excludes certain individual dicom files that are invalid or contain scanner-derived data not compatible with BIDS conversion.
- Symlink (default) or copy: Creates symlinks from `raw_dicom/` to the `<DATASET_ROOT>/dicom`, where all participant specific dicoms are in a flat list. The symlinks are relative so that they are preserved in containers.
- Validate: Excludes certain individual dicom files that are invalid or contain scanner-derived data not compatible with BIDS conversion. Enabled by default, disable by passing `--skip_dcm_check`.
- Symlink (default) or copy: Creates symlinks from `raw_dicom/` to the `<DATASET_ROOT>/dicom`, where all participant specific dicoms are in a flat list. The symlinks are relative so that they are preserved in containers. Disable by passing `--no_symlink`.
- Update status: if successful, set the `organized` column to `True` in `doughnut.csv`.

> Sample cmd:
```bash
python run_dicom_org.py \
python nipoppy/workflow/dicom_org/run_dicom_org.py \
--global_config <global_config_file> \
--session_id <session_id> \
```
Loading

0 comments on commit dfffd7b

Please sign in to comment.