diff --git a/docs/data_prep.md b/docs/data_prep.md index 5532e9f2..c2b622df 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -1,5 +1,10 @@ # Preparing the phenotypic data +!!! tip "Looking for more general guidelines on how to organize your dataset?" + + We recommend also checking out [Nipoppy](https://nipoppy.readthedocs.io/), a protocol for standardized organization and processing of clinical-neuroimaging datasets that extends [BIDS](https://bids-specification.readthedocs.io/en/stable/). + Neurobagel tools are designed to be compatible with data organized according to the Nipoppy specification, although you do not need to use Nipoppy in order to use Neurobagel. + To use the Neurobagel annotation tool, please prepare the tabular data for your dataset as a single, tab-separated file (`.tsv`). diff --git a/docs/imgs/code_org.png b/docs/imgs/code_org.png deleted file mode 100644 index 267c208d..00000000 Binary files a/docs/imgs/code_org.png and /dev/null differ diff --git a/docs/imgs/data_org.jpg b/docs/imgs/data_org.jpg deleted file mode 100644 index b8430f07..00000000 Binary files a/docs/imgs/data_org.jpg and /dev/null differ diff --git a/docs/imgs/digest.png b/docs/imgs/digest.png deleted file mode 100644 index c44b5fe4..00000000 Binary files a/docs/imgs/digest.png and /dev/null differ diff --git a/docs/imgs/nipoppy_org.jpg b/docs/imgs/nipoppy_org.jpg deleted file mode 100644 index 3aa66ac0..00000000 Binary files a/docs/imgs/nipoppy_org.jpg and /dev/null differ diff --git a/docs/imgs/steps.jpg b/docs/imgs/steps.jpg deleted file mode 100644 index 5c8c37c5..00000000 Binary files a/docs/imgs/steps.jpg and /dev/null differ diff --git a/docs/nipoppy/cli_note.md b/docs/nipoppy/cli_note.md deleted file mode 100644 index 30a7b939..00000000 --- a/docs/nipoppy/cli_note.md +++ /dev/null @@ -1,10 +0,0 @@ -!!! note "Nipoppy docs are moving" - - Nipoppy is undergoing a major refactor to move from scripts to a - command-line interface (CLI) and Python API. The new documentation website - (work in progress) can be found at - [https://nipoppy.readthedocs.io/](https://nipoppy.readthedocs.io/). - - If you are using the (soon-to-be legacy) scripts from Nipoppy 0.1.0, this is - still the correct place to be. But we encourage you to check out the new - website! diff --git a/docs/nipoppy/code_org.md b/docs/nipoppy/code_org.md deleted file mode 100644 index 7963589d..00000000 --- a/docs/nipoppy/code_org.md +++ /dev/null @@ -1,35 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Code organization - ---- - -The Nipoppy codebase is divided into data processing `workflows` and data availability `trackers`. - ---- - -**`workflow`** - -- MRI data organization ([`dicom_org`](./workflow/dicom_org.md) and [`bids_conv`](./workflow/bids_conv.md)) - - Custom script to organize raw DICOMs (i.e. scanner output) into a flat participant-level directory. - - Convert DICOMs into BIDS using [Heudiconv](https://heudiconv.readthedocs.io/en/latest/) -- MRI data processing (`proc_pipe`) - - Runs a set of containerized MRI image processing pipelines -- Tabular data (`tabular`) - - Custom scripts to organize raw tabular data (e.g. clinial assessments) - - Custom scripts to normalize and standardize data and metadata for downstream harmonization (see [NeuroBagel](../index.md)) - -**`trackers`** - -- Track available raw, standardized, and processed data -- Generate `bagels` for Neurobagel graph and dashboard. - ---- - -*Legend* -- Red: dataset-specific code and configuration files -- Yellow: Neurobagel interface - -![code_org](../imgs/code_org.png) diff --git a/docs/nipoppy/configs.md b/docs/nipoppy/configs.md deleted file mode 100644 index 504a6987..00000000 --- a/docs/nipoppy/configs.md +++ /dev/null @@ -1,101 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Global files - ---- - -Nipoppy requires two global files for specifying local data/container paths and recruitment manifest. - ---- - -### Global configs: `global_configs.json` - - This is a dataset-specific file and needs to be modified based on local configs and paths - - This file is used as an input to all workflow runscripts to read, process and track available data - - Copy, rename, and populate [sample_global_configs.json](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/sample_global_configs.json) - - This file contains: - - Name of the Nipoppy dataset (`DATASET_NAME`, e.g., `PPMI`) - - Path to the Nipoppy dataset (`DATASET_ROOT`) - - Path to a local `CONTAINER_STORE` comprising containers used by several workflow scripts - - Path to a Singularity executable (`SINGULARITY_PATH`) - - Path to a TemplateFlow directory, if using fMRIPrep (`TEMPLATEFLOW_DIR`) - - List of session IDs (`SESSION`, for MRI data) and visit IDs (`VISITS`, for tabular data) - - Containers + versions for BIDS conversion: HeuDiConv, BIDS validator (`BIDS`) - - List of processing pipelines + versions (`PROC_PIPELINES`) - - Information about tabular data (`TABULAR`) - - Version and path to the data dictionary (`data_dictionary`) - -!!! Note - - Nipoppy uses the term "session" to refer to a session ID string with the "ses-" prefix. For example, `ses-01` is a session, and `01` is the session ID associated with this session. - -!!! Suggestion - - Although not mandatory, for consistency the preferred location would be: `/proc/global_configs.json`. - - -#### Sample `global_configs.json` -```json -{ - "DATASET_NAME": "MyDataset", - "DATASET_ROOT": "/path/to/MyDataset", - "CONTAINER_STORE": "/path/to/container_store", - "SINGULARITY_PATH": "singularity", - "TEMPLATEFLOW_DIR": "/path/to/templateflow", - - "SESSIONS": ["ses-1","ses-5","ses-7","ses-9","ses-11"], - - "BIDS": { - "heudiconv": { - "VERSION": "0.11.6", - "CONTAINER": "heudiconv_{}.sif", - "URL": "" - }, - "validator":{ - "CONTAINER": "bids_validator.sif", - "URL": "" - - } - }, - - "PROC_PIPELINES": { - "mriqc": { - "VERSION": "", - "CONTAINER": "mriqc_{}.sif", - "URL": "" - }, - "fmriprep": { - "VERSION": "20.2.7", - "CONTAINER": "fmriprep_{}.sif", - "URL": "" - }, - "freesurfer": { - "VERSION": "6.0.1", - "CONTAINER": "fmriprep_{}.sif", - "URL": "" - } - } -} -``` - -### Participant manifest: `manifest.csv` - - This list serves as the **ground truth** for subject and visit (i.e. session) availability - - Create the `manifest.csv` in `/tabular/` comprising following columns: - - `participant_id`: ID assigned during recruitment (at times used interchangeably with `subject_id`) - - `visit`: label to denote participant visit for data acquisition - - ***Note***: we recommend that visits describe a timeline if possible, for example `BL`, `M12`, `M24` (for Baseline, Month 12, and Month 24 respectively). - - Alternatively, visits should be ordinal and ideally named with the `V` prefix (e.g., `V01`, `V02`) - - `session`: alternative naming for visit - typically used for imaging data to comply with [BIDS standard](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html) - - `datatype`: a list of acquired imaging datatype as defined by [BIDS standard](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html) - - New participant are appended upon recruitment as new rows - - Participants with multiple visits (i.e. sessions) should be added as separate rows - -#### Sample `manifest.csv` - -| participant_id | visit | session | datatype | -|----------------|-------|---------|------------------------------| -| 001 | V01 | ses-01 | ["anat","dwi","fmap","func"] | -| 001 | V02 | ses-02 | ["anat"] | -| 002 | V01 | ses-01 | ["anat","dwi"] | -| 002 | V03 | ses-03 | ["anat","dwi"] | diff --git a/docs/nipoppy/data_org.md b/docs/nipoppy/data_org.md deleted file mode 100644 index b3fec387..00000000 --- a/docs/nipoppy/data_org.md +++ /dev/null @@ -1,28 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Data organization - ---- - -An Nipoppy dataset consists of a specific directory structure to organize MRI and tabular data. - ---- - -Directories: - -- `tabular` - - contains `manifest.csv` - - `demographics`: contains demographic data (e.g. age, sex) - - `assessments`: contains clinical assessments (e.g. MoCA) -- `downloads`: data dumps from remote data-stores (e.g. LONI) -- `scratch`: space for un-organized data and wrangling -- `dicom`: participant-level dicom dirs -- `bids`: BIDS formatted dataset -- `derivatives`: output of processing pipelines (e.g. fmriprep, mriqc) -- `proc`: space for config and log files of the processing pipelines -- `backups`: data backup space (tars) -- `releases`: data releases (symlinks) - -![data_org](../imgs/data_org.jpg) \ No newline at end of file diff --git a/docs/nipoppy/glossary.md b/docs/nipoppy/glossary.md deleted file mode 100644 index 6a72274c..00000000 --- a/docs/nipoppy/glossary.md +++ /dev/null @@ -1,56 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Glossary - -This page lists some definitions for important/recurring terms used in the Nipoppy framework. - -### `participant_id` - -**Appears in**: `manifest.csv`, `doughnut.csv` - -: Unique identifier for the participant (i.e., subject ID), as provided by the study. - -### `datatype` - -**Appears in**: `manifest.csv` - -: A BIDS-compliant "data type" value (see the [BIDS specification website](https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions) for a comprehensive list). The most common data types for magnetic resonance imaging (MRI) data are `"anat"`, `"func"`, and `"dwi"`. - -### `visit` - -**Appears in**: `manifest.csv` - -: An identifier for a data collection event, not restricted to imaging data. - -See also: [`session` vs `visit`](#session-vs-visit) - -### `session` - -**Appears in**: `manifest.csv`, `doughnut.csv` - -: A BIDS-compliant session identifier. Consists of the `"ses-"` prefix followed by the [`session_id`](#session_id). - -#### [`session`](#session) vs [`visit`](#visit) - -Nipoppy uses `session` for imaging data, following the convention established by BIDS. The term `visit`, on the other hand, is used to refer to any data collection event (not necessarily imaging-related). In most cases, `session` and `visit` will be identical (or `session`s will be a subset of `visit`s). However, having two descriptors becomes particularly useful when imaging and non-imaging assessments do not use the same naming conventions. - -### `participant_dicom_dir` - -**Appears in**: `doughnut.csv` - -: The name of the directory in which the raw DICOM data (before the DICOM organization step) are found. Usually, this is the same as [`participant_id`](#participant_id), but depending on the study it could be different. - -### `dicom_id` - -**Appears in**: `doughnut.csv` - -: The [`participant_id`](#participant_id), stripped of any non-alphanumerical character. For studies that do not use non-alphanumerical characters in their participant IDs, this is exactly the same as [`participant_id`](#participant_id). - -### `bids_id` - -**Appears in**: `doughnut.csv` - -: A BIDS-compliant participant identifier. Obtained by adding the `"sub-"` prefix to the [`dicom_id`](#dicom_id), which itself is derived from the [`participant_id`](#participant_id). A participant's raw BIDS data and derived imaging data are stored in directories named after their `bids_id`. - diff --git a/docs/nipoppy/installation.md b/docs/nipoppy/installation.md deleted file mode 100644 index c343829a..00000000 --- a/docs/nipoppy/installation.md +++ /dev/null @@ -1,35 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Code installation and dataset setup - ---- - -The Nipoppy workflow comprises a Nipoppy codebase that operates on a Nipoppy dataset with a specific directory structure (initialized with a `tree.py` script). - ---- - -### Nipoppy code+env installation -1. Change directory to where you want to clone this repo, e.g.: `cd /home//projects//code/` -2. Create a new [venv](https://realpython.com/python-virtual-environments-a-primer/): `python3 -m venv nipoppy_env` - * Alternatively (if using [Anaconda/Miniconda](https://www.anaconda.com/)), create a `conda` environment: `conda create --name nipoppy_env python=3.9` -3. Activate your env: `source nipoppy_env/bin/activate` - * If using Anaconda/Miniconda: `conda activate nipoppy_env` -4. Clone this repo: `git clone https://github.com/neurodatascience/nipoppy.git` -5. Change directory to `nipoppy` -6. Install python dependencies: `pip install -e .` - -### Nipoppy dataset directory setup - -Run [`nipoppy/tree.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/tree.py) to create the Nipoppy dataset directory tree: -```bash -python nipoppy/tree.py --nipoppy_root -``` -Where: - -- `DATASET_ROOT`: root (starting point) of the Nipoppy structured dataset - -!!! Suggestion - - We suggest naming DATASET_ROOT directory after a study or a cohort. diff --git a/docs/nipoppy/overview.md b/docs/nipoppy/overview.md deleted file mode 100644 index 38ce1ff6..00000000 --- a/docs/nipoppy/overview.md +++ /dev/null @@ -1,98 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## What is Nipoppy? - -[Nipoppy](https://github.com/neurodatascience/nipoppy) is a lightweight framework for analyzing (neuro)imaging and clinical data. It is designed to help users do the following: - -1. Curate and organize data into a standard directory structure -2. Run data processing pipelines in a semi-automated and reproducible way -3. Track availability (including processing status, if applicable) of raw, tabular and derived data -4. Extract imaging features from MRI derivatives data for downstream analysis - -### Example workflow - -Given a dataset to process, a typical workflow with Nipoppy can look like this: - -1. Fork the [Nipoppy template code repository](https://github.com/neurodatascience/nipoppy), then clone it - * This repository contains scripts to run processing pipelines (and track their results) on BIDS data -2. Standardize raw imaging data: convert raw DICOM files into the NIfTI format and organize the dataset according to the [BIDS standard](https://bids-specification.readthedocs.io/en/stable/) - * This requires some custom scripts, including the [`heuristic.py` file for HeuDiConv](https://heudiconv.readthedocs.io/en/latest/heuristics.html) -3. Run commonly-used image processing pipelines - * Nipoppy currently supports [FreeSurfer](https://surfer.nmr.mgh.harvard.edu/), [fMRIPrep](https://fmriprep.org/en/stable/), [TractoFlow](https://tractoflow-documentation.readthedocs.io/en/latest/), and [MRIQC](https://mriqc.readthedocs.io/en/stable/) out-of-the-box, but new pipelines can be added by the user if needed -4. Organize demographic and clinical assessment data - * This will most likely involve some custom data wrangling, for example to combine clinical assessment scores into a single file -5. Run tracker scripts to determine the availability of imaging (raw and/or processed) and/or tabular data - * We call these availability/status metadata files "bagels" because they can be ingested by [Neurobagel](https://www.neurobagel.org/) for [dashboarding](https://digest.neurobagel.org/) and [querying](https://query.neurobagel.org/) participants across multiple studies - -## Who is Nipoppy for? - -Anyone who wants to process datasets with imaging data and/or use datasets processed with Nipoppy. - -**Data managers** - -* People who process datasets, either for specific analyses or to share with others -* Example use cases: - * Data curation - * Download/organize raw DICOM files - * Convert raw data to a BIDS directory structure - * Organize clinical data - * Data processing - * Run pre-existing processing pipelines - * Add scripts to run custom pipelines - * Do cross-dataset processing: running the same pipelines/versions on different datasets so that the outputs can be used together - * Data tracking - * Check processing failures and relaunch processing - * Generate bagel files for Neurobagel - -**Data users** - -* People who use tabular, derivative or other files produced by Nipoppy (e.g., from a shared Nipoppy-compliant dataset) -* Example use cases: - * Data querying - * Check availability of raw and/or derived data - * Check which pipelines and version have been run - * Data analysis - * Extract imaging features for downstream analyses from specific pipelines/versions - -## How does Nipoppy work? - -### Modules - -1. `Code`: Codebase [repo](code_org.md) for running and tracking workflows - * The codebase should start from the [Nipoppy template repository](https://github.com/neurodatascience/nipoppy). Additional custom scripts need to be added to process specific datasets. -2. `Data`: [Dataset](data_org.md) organized in a specific directory structure - * This contains the data only and should not contain any code -3. `Containers`: [Singularity/Apptainer](https://apptainer.org/) containers encapsulating processing pipelines - -### Organization -Organization of `Code`, `Data`, and `Container` modules - -![nipoppy_org](../imgs/nipoppy_org.jpg) - -### Steps -The Nipoppy workflow steps and linked identifiers (i.e. `participant_id`, `dicom_id`, `bids_id`) are shown below: - -![steps](../imgs/steps.jpg) - -## FAQ - -1. Do I need to install anything to use Nipoppy? - * Nipoppy requires Python 3 and [Apptainer/Singularity](https://apptainer.org/) to work. -2. Can Nipoppy process my entire dataset out-of-the-box? - * No: every dataset is different, and it is virtually impossible to have a workflow flexible enough to work with any dataset format or structure. However, once the imaging data is converted to a standard BIDS structure, running the image processing pipelines should be very straightforward. -3. Do I need to follow all the steps listed in the [example workflow](#example-workflow)? - * No: the purpose of the example workflow is to illustrate what can be done with Nipoppy, but you can choose to only use it for specific features (e.g., tracking). -4. Can I run Nipoppy scripts in Windows/macOS? - * Nipoppy is designed to run on the Linux operating system, and will likely not work on other operating systems. This is mainly because it relies on Singularity, which [cannot run natively on Windows or macOS](https://apptainer.org/docs/admin/main/installation.html#installation-on-windows-or-mac). It is probably possible to use Nipoppy with Windows/macOS (e.g., using virtual machines), but we do not recommend it. -5. Can I use Nipoppy on a cluster? - * Yes, as long as the cluster has [Apptainer/Singularity](https://apptainer.org/) installed -6. Do I need to know how to use [Apptainer/Singularity](https://apptainer.org/)? - * The Nipoppy code repo contains scripts that call Singularity to run image processing pipelines. Users are not required to use Singularity directly, though we encourage users to learn about containers and/or Singularity if they are not familiar with these terms. -7. I want to use Nipoppy with my own pipeline. Does it need to be containerized? - * Although we recommend the use of containers to facilitate reproducibility, it is not a strict requirement. You can run your own pipelines any way you want (on the BIDS data or even raw data), though the outputs should be organized in the same way as the other pipelines (fMRIPrep, TractoFlow, etc.) if you want to use the tracking features. -8. What is [Neurobagel](https://www.neurobagel.org/) and do I need to use it? - * Neurobagel is a data harmonization project that includes tools to perform cross-datasets searches for imaging data availability. You do not need Neurobagel for Nipoppy, though some Nipoppy outputs (specifically the `bagel` tracking files) can be used as input to some Neurobagel tools. -9. How do I use the dashboard? - * Simply visit [https://digest.neurobagel.org](https://digest.neurobagel.org) (no installation required). More information about the dashboard can be found [here](https://github.com/neurobagel/digest). diff --git a/docs/nipoppy/trackers.md b/docs/nipoppy/trackers.md deleted file mode 100644 index 059fa7fc..00000000 --- a/docs/nipoppy/trackers.md +++ /dev/null @@ -1,70 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Track data availability status - ---- - -Trackers check the availability of files created during the dataset processing workflow (specifically the BIDS raw data and imaging pipeline derivatives) and assign an availability status (`SUCCESS`, `FAIL`, `INCOMPLETE` or `UNAVAILABLE`). - ---- - -### Key directories and files - -- `/bids` -- `/derivatives` -- `/derivatives/bagel.csv` - -### Running the tracker script - -The tracker uses the [`manifest.csv`](./configs.md#participant-manifest-manifestcsv) and [`doughnut.csv`](./workflow/dicom_org.md#procedure) files to determine the participant-session pairs to check. Each available tracker has an associated configuration file (typically called `_tracker.py`), where lists of expected paths for files produced by the pipeline are defined. - -For each participant-session pair being tracked, the tracker outputs a `"pipeline_complete"` status. Depending on the configuration for that particular pipeline, the tracker might also output phase and/or stage statuses (e.g., `"PHASE__func"`), which typically refer to sub-pipelines within the full pipeline that may or may not have been run during processing, depending on the input data and/or processing parameters. - -The tracker script updates the tabular `/derivatives/bagel.csv` file (see the [Understanding the `bagel.csv` output](#understanding-the-bagelcsv-output) for more information). - -> Sample command: -```bash -python nipoppy/trackers/run_tracker.py \ - --global_config - --dash_schema nipoppy/trackers/bagel_schema.json - --pipelines fmriprep mriqc tractoflow heudiconv -``` - -Notes: -- Currently available image processing pipelines are: `fmriprep`, `mriqc`, and `tractoflow`. See [Adding a tracker](#adding-a-tracker) for the steps to add a new tracker. -- Use `--pipelines heudiconv` for tracking BIDS data availability -- An optional `--session_id` parameter can be specified to only track a specific session. By default, the trackers are run for all sessions. -- Other optional arguments include `--run_id` and `--acq_label`, to help generate expected file paths for BIDS Apps. - -### Understanding the `bagel.csv` output - -A JSON schema for the `bagel.csv` file produced by the tracker script is available [here](https://github.com/neurobagel/digest/blob/main/schemas/bagel_schema.json). - -Here is an example of a `bagel.csv` file: - -| bids_id | participant_id | session | has_mri_data | pipeline_name | pipeline_version | pipeline_starttime | pipeline_complete | -| ------- | -------------- | ------- | ------------ | ------------- | ---------------- | ------------------ | ----------------- | -| sub-MNI001 | MNI001 | 1 | TRUE | freesurfer | 6.0.1 | 2022-05-24 13:43 | SUCCESS | -| sub-MNI001 | MNI001 | 2 | TRUE | freesurfer | 6.0.1 | 2022-05-24 13:46 | SUCCESS | -| sub-MNI001 | MNI001 | 3 | TRUE | freesurfer | 6.0.1 | UNAVAILABLE | INCOMPLETE | - -The imaging derivatives bagel has one row for each participant-session-pipeline combination. The pipeline status columns are `"pipeline_complete"`, and any column whose name begins by `"PHASE__"` or `"STAGE__"`. The possible values for these columns are: -- `"SUCCESS"`: All expected pipeline output files (as configured by pipeline tracker) are present. -- `"FAIL"`: At least one expected pipeline output is missing. -- `"INCOMPLETE"`: Pipeline has not been run for the subject session (output directory missing). -- `"UNAVAILABLE"`: Relevant MRI modality for pipeline not available for subject session (determined by the `datatype` column in the dataset's manifest file). - -### Adding a tracker - -1. Create a new file in `nipoppy/trackers` called `_tracker.py`. -2. Define a config dictionary `tracker_configs`, with a mandatory key `"pipeline_complete"` whose value is a function that takes as input the path to the subject result directory, as well as the session and run IDs, and outputs one of `"SUCCESS"`, `"FAIL"`, `"INCOMPLETE"`, or `"UNAVAILABLE"`. See the built-in [fMRIPrep tracker](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/trackers/fmriprep_tracker.py) for an example. -3. Optionally add additional stages and phases to track. Again, refer to the [fMRIPrep tracker](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/trackers/fmriprep_tracker.py) for to any other pre-defined tracker configuration for an example. -4. Modify `nipoppy/trackers/run_tracker.py` to add the new tracker as an option. - -### Visualizing availability status with the Neurobagel [`digest`](https://digest.neurobagel.org/) - -The `bagel.csv` file written by the tracker can be uploaded to [https://digest.neurobagel.org/](https://digest.neurobagel.org/) (as an "imaging CSV file") for interactive visualizations of processing status. - -![digest](../imgs/digest.png) diff --git a/docs/nipoppy/workflow/bids_conv.md b/docs/nipoppy/workflow/bids_conv.md deleted file mode 100644 index 8c4aa187..00000000 --- a/docs/nipoppy/workflow/bids_conv.md +++ /dev/null @@ -1,59 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Objective - ---- - -Convert DICOMs to BIDS using [Heudiconv](https://heudiconv.readthedocs.io/en/latest/) ([tutorial](https://neuroimaging-core-docs.readthedocs.io/en/latest/pages/heudiconv.html)). - ---- - - -### Key directories and files - -- `/dicom` -- `/bids` -- `/scratch/raw_dicom/doughnut.csv` -- `heuristic.py` - -### Procedure - -1. Ensure you have the appropriate HeuDiConv container listed in your `global_configs.json` -2. Use [nipoppy/workflow/bids_conv/run_bids_conv.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/bids_conv/run_bids_conv.py) to run HeuDiConv `stage_1` and `stage_2`. - - Run `stage_1` to generate a list of available protocols from the DICOM header. These protocols are listed in `/bids/.heudiconv//info/dicominfo_ses-.tsv` - -> Sample cmd: -```bash -python nipoppy/workflow/bids_conv/run_bids_conv.py \ - --global_config \ - --session_id \ - --stage 1 -``` - -!!! note - - If participants have multiple sessions (or visits), these need to be converted separately and combined post-hoc to avoid Heudiconv errors. - -3. Copy+Rename [sample_heuristic.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/bids_conv/sample_heuristic.py) to `heuristic.py` in the code repo itself. Then edit `./heuristic.py` to create a name-mapping (i.e. dictionary) for BIDS organization based on the list of available protocols. - -!!! note - - This file automatically gets copied into `/proc/heuristic.py` to be seen by the Singularity container. - - -4. Run `stage_2` to convert the dicoms into BIDS format based on the mapping from `heuristic.py`. This updates `/scratch/raw_dicom/doughnut.csv` (sets `converted` column to `True`). - -> Sample cmd: -```bash -python nipoppy/workflow/bids_conv/run_bids_conv.py \ - --global_config \ - --session_id \ - --stage 2 -``` - - -!!! note - - Once `heuristic.py` is finalized, only `stage_2` needs to be run periodically unless new scan protocol is added. diff --git a/docs/nipoppy/workflow/dicom_org.md b/docs/nipoppy/workflow/dicom_org.md deleted file mode 100644 index ba72e2dd..00000000 --- a/docs/nipoppy/workflow/dicom_org.md +++ /dev/null @@ -1,63 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Objective - ---- - -This is a dataset-specific process and needs to be customized based on local scanner DICOM dumps and file naming. This organization should produce, for a given session, participant specific dicom dirs. Each of these participant-dir contains a flat list of dicoms for the participant for all available imaging modalities and scan protocols. The manifest is used to determine which new subject-session pairs need to be processed, and a `doughnut.csv` file is used to track the status for the DICOM reorganization and BIDS conversion steps. - ---- -### Key directories and files - -- `/tabular/manifest.csv` -- `/downloads` -- `/scratch/raw_dicom` -- `/scratch/raw_dicom/doughnut.csv` -- `/dicom` - -### Procedure - -1. Run [`nipoppy/workflow/make_doughnut.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/make_doughnut.py) to update `doughnut.csv` based on the manifest. It will add new rows for any subject-session pair not already in the file. - - To create the `doughnut.csv` for the first time, use the `--empty` argument. If processing has been done without updating `doughnut.csv`, use `--regenerate` to update it based on new files in the dataset. - -!!! note - - The `doughnut.csv` file is used to track the multi-step conversion of raw DICOMs to BIDS: whether raw DICOMs have been downloaded to disk, re-organized into a directory structure accepted by [HeuDiConv](https://github.com/nipy/heudiconv), and converted to BIDS. This file is updated automatically by scripts in `workflow/dicom_org` and `workflow/bids_conv`. Backups are created in case it is needed to revert to a previous version: they can be found in `/scratch/raw_dicom/.doughnuts`. - - Here is a sample `doughnut.csv` file: - - | participant_id | session | participant_dicom_dir | dicom_id | bids_id | downloaded | organized | converted | - |----------------|---------|-----------------------|----------|---------|------------|-----------|-----------| - | 001 | ses-01 | MyStudy_001_2021 | sub-001 | sub-001 | True | True | True | - | 001 | ses-02 | MyStudy_001_2022 | sub-001 | sub-001 | True | False | False | - | 002 | ses-01 | MyStudy_002_2021 | sub-002 | sub-002 | True | True | False | - | 002 | ses-03 | MyStudy_002_2024 | sub-002 | sub-002 | False | False | False | - -2. Download DICOM dumps (e.g. ZIPs / tarballs) in the `/downloads` directory. Different visits (i.e. sessions) must be downloaded in separate sub-directories and ideally named as listed in the `global_configs.json`. The DICOM download and extraction process is highly dataset-dependent, and we recommend using custom scripts to automate it as much as possible. -3. Extract (and rename if needed) all participants into `/scratch/raw_dicom` separately for each visit (i.e. session). - - At this point, the `doughnut.csv` should have been updated to reflect the new downloads (`downloaded` column set to `True` where appropriate). We recommend doing this in the download script (i.e. in Step 2), but `workflow/make_doughnut.py` can also be run with the `--regenerate` flag to search for the expected files (this can be very slow!). - - -!!! note - - **IMPORTANT**: the participant-level directory names should match `participant_id`s in the `manifest.csv`. It is recommended to use `participant_id` naming format to exclude any non-alphanumeric chacaters (e.g. "-" or "_"). If your participant_id does contain these characters, it is still recommended to remove them from the participant-level DICOM directory names (e.g., QPN_001 --> QPN001). - -!!! note - - It is **okay** for the participant directory to have messy internal subdir tree with DICOMs from multiple modalities. (See [data org schematic](../../imgs/data_org.jpg) for details). The run script will search and validate all available DICOM files automatically. - - -4. Run [`nipoppy/workflow/dicom_org/run_dicom_org.py`](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/dicom_org/run_dicom_org.py) to: - - Search: Find all the DICOMs inside the participant directory. - - Validate: Excludes certain individual dicom files that are invalid or contain scanner-derived data not compatible with BIDS conversion. Enabled by default, disable by passing `--skip_dcm_check`. - - Symlink (default) or copy: Creates symlinks from `raw_dicom/` to the `/dicom`, where all participant specific dicoms are in a flat list. The symlinks are relative so that they are preserved in containers. Disable by passing `--no_symlink`. - - Update status: if successful, set the `organized` column to `True` in `doughnut.csv`. - -> Sample cmd: -```bash -python nipoppy/workflow/dicom_org/run_dicom_org.py \ - --global_config \ - --session_id \ -``` diff --git a/docs/nipoppy/workflow/proc_pipe/fmriprep.md b/docs/nipoppy/workflow/proc_pipe/fmriprep.md deleted file mode 100644 index fbe3b97c..00000000 --- a/docs/nipoppy/workflow/proc_pipe/fmriprep.md +++ /dev/null @@ -1,84 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## Objective - ---- - -Run [fMRIPrep](https://fmriprep.org/en/stable/) pipeline on BIDS-formatted dataset. Note that a standard fMRIPrep run also includes FreeSurfer processing. - ---- - -### Key directories and files - -- `/bids` -- `/derivatives/fmriprep/` -- `/derivatives/freesurfer/` -- `bids_filter.json` - -### Procedure - -- Ensure you have the appropriate fMRIPrep container listed in your `global_configs.json` -- Use [nipoppy/workflow/proc_pipe/fmriprep/run_fmriprep.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/proc_pipe/fmriprep/run_fmriprep.py) script to run the fMRIPrep pipeline. -- You can run "anatomical only" workflow by adding `--anat_only` flag -- (Optional) Copy+Rename [sample_bids_filter.json](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/proc_pipe/fmriprep/sample_bids_filter.json) to `bids_filter.json` in the code repo itself. Then edit `bids_filter.json` to filter certain modalities / acquisitions. This is common when you have multiple T1w acquisitions (e.g. Neuromelanin, SPIR etc.) for a given modality. - -!!! note - - When `--use_bids_filter` flag is set, this `bids_filter.json` is automatically copied into `/bids/bids_filter.json` to be seen by the Singularity container. - - -- For FreeSurfer tasks, you need to have a [license.txt](https://surfer.nmr.mgh.harvard.edu/fswiki/License) file inside `/derivatives/freesurfer/` -- fMRIPrep manages brain-template spaces using [TemplateFlow](https://fmriprep.org/en/stable/spaces.html). These templates can be shared across studies and datasets. Use `global_configs.json` to specify path to `TEMPLATEFLOW_DIR` where these templates can reside. - - -!!! note - - For machines with Internet connections, all required templates are automatically downloaded duing the fMRIPrep run. - -> Sample cmd: -```bash -python nipoppy/workflow/proc_pipe/fmriprep/run_fmriprep.py \ - --global_config \ - --participant_id MNI01 \ - --session_id 01 \ - --use_bids_filter -``` - -!!! note - - Unlike DICOM and BIDS run scripts, `run_fmriprep.py` can only process 1 participant at a time due to heavy compute requirements of fMRIPrep. For parallel processing on a cluster, sample HPC job scripts (Slurm and SGE) are provided in the [hpc](https://github.com/neurodatascience/nipoppy/tree/main/workflow/proc_pipe/fmriprep/scripts) subdirectory. - - -!!! note - - You can change default run parameters in the [run_fmriprep.py](https://github.com/neurodatascience/nipoppy/blob/main/workflow/proc_pipe/fmriprep/run_fmriprep.py) by looking at the [documentation](https://fmriprep.org/en/stable/usage.html) - -!!! note - - Clean up working dir (`fmriprep_wf`): fMRIPrep run generates huge number of intermediate files. You should remove those after successful run to free up space. - - -### fMRIPrep tasks - - Main MR processing tasks run by fMRIPrep (see [fMRIPrep documentation](https://fmriprep.org/en/stable/) for details): - - Preprocessing - - Bias correction / Intensity normalization (N4) - - Brain extraction (ANTs) - - Spatial normalization to standard space(s) - - Anatomical - - Tissue segmentation (FAST) - - FreeSurfer recon-all - - Functional - - BOLD reference image estimation - - Head-motion estimation - - Slice time correction - - Susceptibility Distortion Correction (SDC) - - Pre-processed BOLD in native space - - EPI to T1w registration - - Resampling BOLD runs onto standard spaces - - EPI sampled to FreeSurfer surfaces - - Confounds estimation - - ICA-AROMA (not run by default) - - Qualtiy Control - - [Visual reports](https://fmriprep.org/en/stable/outputs.html#visual-reports) \ No newline at end of file diff --git a/docs/nipoppy/workflow/proc_pipe/mriqc.md b/docs/nipoppy/workflow/proc_pipe/mriqc.md deleted file mode 100644 index 1259f2fb..00000000 --- a/docs/nipoppy/workflow/proc_pipe/mriqc.md +++ /dev/null @@ -1,29 +0,0 @@ -{% - include-markdown "nipoppy/cli_note.md" -%} - -## MRIQC image processing pipeline - ---- - -MRIQC processes the participants and produces image quality metrics from T1w, T2w and BOLD data. - ---- - - -### [MRIQC](https://mriqc.readthedocs.io/en/latest/) -- Use [nipoppy/workflow/proc_pipe/mriqc/run_mriqc.py](https://github.com/neurodatascience/nipoppy/blob/main/nipoppy/workflow/proc_pipe/mriqc/run_mriqc.py) to run MRIQC pipeline directly or wrap the script in an SGE/Slurm script to run on cluster - -```bash -python nipoppy/workflow/proc_pipe/mriqc/run_mriqc.py \ - --global_config \ - --participant_id \ - --session_id \ - --modalities \ -``` - -The required arguments are: -- `--global_config`: path to the configuration containing the MRIQC container and data directory -- `--participant_id`: participant/subject ID -- `--session_id`: session ID -- `--modality`: modality/modalities to check (valid values: `T1w`, `T2w`, `bold`, `dwi`) diff --git a/md_link_check_config.json b/md_link_check_config.json index a20b86bd..560cba59 100644 --- a/md_link_check_config.json +++ b/md_link_check_config.json @@ -6,8 +6,7 @@ {"pattern": "^../.*"}, {"pattern": ".*localhost.*"}, {"pattern": "https://api.neurobagel.org/.*"}, - {"pattern": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"}, - {"pattern": "https://github.com/neurodatascience/nipoppy/.*"} + {"pattern": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"} ], "timeout": "60s" } \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 8a174a4c..7d40883e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,20 +41,6 @@ nav: - Augmented BIDS data dictionaries: "dictionaries.md" - Data files for Neurobagel graphs: "graph_data.md" - Naming conventions for terms: "term_naming_standards.md" - - Preparing data using Nipoppy: - - Overview: "nipoppy/overview.md" - - Installation: "nipoppy/installation.md" - - Configs: "nipoppy/configs.md" - - "Code organization": "nipoppy/code_org.md" - - "Data organization": "nipoppy/data_org.md" - - Workflow: - - "DICOM organization": "nipoppy/workflow/dicom_org.md" - - "BIDS conversion": "nipoppy/workflow/bids_conv.md" - - Pipeline-specific instructions: - - fmriprep: "nipoppy/workflow/proc_pipe/fmriprep.md" - - mriqc: "nipoppy/workflow/proc_pipe/mriqc.md" - - Trackers: "nipoppy/trackers.md" - - Glossary: "nipoppy/glossary.md" - Contributing: - How to contribute: "contributing/CONTRIBUTING.md" - Our team: "contributing/team.md" diff --git a/requirements.txt b/requirements.txt index f81cf14f..16ef7825 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ mkdocs-include-markdown-plugin[cache] mkdocs-material -mkdocs-table-reader-plugin +mkdocs-table-reader-plugin>=3.0.0 mkdocs-yamp pre-commit \ No newline at end of file