Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Document short CLI option names, subject ID case sensitivity, and strict trailing slash handling #214

Merged
merged 7 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,32 @@ Neurobagel also hosts its own public instances of a node API and a federation AP
[https://api.neurobagel.org/](https://api.neurobagel.org/) is a public, Neurobagel-hosted node API that interfaces with Neurobagel's own running graph instance containing harmonized datasets from the [OpenNeuro](https://openneuro.org/) platform.

## Sending a request to a Neurobagel API directly
Cohort queries of a specific Neurobagel graph database can be submitted via direct requests to the corresponding node API using the `/query` endpoint, e.g. `https://api.neurobagel.org/query/`
Specific query parameters are defined using key-value pairs in the URL following `/query/`.
Cohort queries of a specific Neurobagel graph database can be submitted via direct requests to the corresponding node API using the `/query` endpoint, e.g. `https://api.neurobagel.org/query`.
Specific query parameters are defined using key-value pairs in the URL following `/query`.

**Example: "I want to query for only female participants in the OpenNeuro graph."**

The URL for such a query would be `https://api.neurobagel.org/query/?sex=snomed:248152002`, where `snomed:248152002` is a controlled term from the SNOMED CT vocabulary corresponding to female sex.
The URL for such a query would be `https://api.neurobagel.org/query?sex=snomed:248152002`, where `snomed:248152002` is a controlled term from the SNOMED CT vocabulary corresponding to female sex.

### Example using a curl request
```bash
# To query for female participants in the graph

curl -X 'GET' \
'http://api.neurobagel.org/query/?sex=snomed:248152002' \
'https://api.neurobagel.org/query?sex=snomed:248152002' \
-H 'accept: application/json'

# or
curl -L http://api.neurobagel.org/query/?sex=snomed:248152002
curl -L https://api.neurobagel.org/query?sex=snomed:248152002
```

!!! warning "Avoid trailing slashes in API endpoint URLs"

Neurobagel APIs have strict requirements regarding trailing slashes.
When sending `curl` requests to an instance of a Neurobagel API, ensure that you do not include trailing slashes in endpoint URLs.
For example, requests to https://api.neurobagel.org/query will work, but https://api.neurobagel.org/query/ will not.
alyssadai marked this conversation as resolved.
Show resolved Hide resolved


## Using the interactive Neurobagel API docs
Interactive documentation for a Neurobagel API (provided by [Swagger UI](https://github.com/swagger-api/swagger-ui)) is available at the `/docs` endpoint (e.g., [https://api.neurobagel.org/docs](https://api.neurobagel.org/docs)) and can also be used to run queries against the graph.

Expand Down
63 changes: 45 additions & 18 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,27 @@
The `bagel-cli` is a simple Python command-line tool to automatically parse and describe subject-level phenotypic and [BIDS](https://bids-specification.readthedocs.io/en/stable/) attributes in an annotated dataset for integration into the Neurobagel graph.

## Installation
### Docker
Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub: `docker pull neurobagel/bagelcli`

Option 2: Clone the repository and build the Docker image locally:
```bash
git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .
```
=== "Docker"

Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub:
```bash
docker pull neurobagel/bagelcli
```

Option 2: Clone the repository and build the Docker image locally:
```bash
git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .
```

=== "Singularity"

### Singularity
Build a Singularity image for `bagel-cli` using the DockerHub image:
`singularity pull bagel.sif docker://neurobagel/bagelcli`
Build a Singularity image for `bagel-cli` using the DockerHub image:
```bash
singularity pull bagel.sif docker://neurobagel/bagelcli
```

## Running the CLI
CLI commands can be accessed using the Docker/Singularity image.
Expand All @@ -35,7 +43,11 @@ To run the CLI on a dataset you have annotated, you will need:

1. A valid BIDS dataset is needed for the CLI to automatically generate harmonized subject-level imaging metadata alongside harmonized phenotypic attributes.

### To view the available CLI commands
### Viewing CLI commands and options

The `bagel-cli` has two commands, `pheno` and `bids`.

Information about each command can be found by running:

=== "Docker"

Expand Down Expand Up @@ -65,22 +77,30 @@ To view the command-line arguments for a specific command:
singularity run bagel.sif <command-name> --help
```


### To run the CLI on data
### Running the CLI on your data
1. `cd` into your local directory containing (1) your phenotypic .tsv file, (2) Neurobagel-annotated data dictionary, and (3) BIDS directory (if available).
2. Run a `bagel-cli` container and include your CLI command and arguments at the end in the following format:

=== "Docker"
```bash
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <CLI command here>
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <full CLI command here>
```

!!! info "What is this command doing?"

This combination of options `--volume=$PWD:$PWD -w $PWD` mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the _container's_ working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)

=== "Singularity"
```bash
singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>
```

In the above command, `--volume=$PWD:$PWD -w $PWD` (or `--bind $PWD --pwd $PWD` for Singularity) mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the _container's_ working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
!!! info "What is this command doing?"

This combination of options `--bind $PWD --pwd $PWD` mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the _container's_ working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)


### Example
If your dataset lives in `/home/data/Dataset1`:
Expand Down Expand Up @@ -138,11 +158,18 @@ You could run the CLI as follows:
--output "neurobagel/Dataset1_pheno_bids.jsonld"
```

!!! note
!!! tip
For short forms of CLI command options, see:
`docker run --rm neurobagel/bagelcli pheno --help`
alyssadai marked this conversation as resolved.
Show resolved Hide resolved
or
`docker run --rm neurobagel/bagelcli bids --help`


!!! note "Speed of the `bids` command"
The `bids` command of the `bagel-cli` (step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure.
Once the slow initial dataset reading step is complete, you should see the message:
```bash
Parsing BIDS metadata to be merged with phenotypic annotations:
BIDS parsing completed.
...
```

Expand Down
26 changes: 19 additions & 7 deletions docs/data_prep.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,29 @@ please prepare the tabular data for your dataset as a single, tab-separated file

## General requirements for the phenotypic TSV

- The TSV must contain a minimum of two columns: at least one column must contain subject IDs,
### All datasets

A valid dataset for Neurobagel **must** include a TSV file that describes participant attributes.
The TSV must contain a minimum of two columns: at least one column must contain subject IDs,
and at least one column must describe demographic or other phenotypic information
(for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md)).
- If the dataset has a corresponding BIDS directory (i.e., imaging data),
at least one column in the TSV must contain subject IDs that match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure).
Further, the IDs in this column must be _the same or a superset_ of the subject labels in the BIDS directory.
That is, Neurobagel does not currently allow for datasets where subjects have BIDS data but are not represented in the phenotypic TSV.

## Accepted forms of tabular data
### Datasets with imaging (BIDS) data

If a dataset has imaging data in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format,
Neurobagel **additionally** requires that:

- At least one column in the phenotypic TSV contains subject IDs that match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure).
If this condition is not met, you will encounter an error when [running the Neurobagel CLI](cli.md) on your dataset to generate Neurobagel graph-ready files.
alyssadai marked this conversation as resolved.
Show resolved Hide resolved

!!! note
Subject IDs are case-sensitive and must be exact string matches. e.g., BIDS subject `sub-MNI001` != a subject with ID `sub-mni001` or `mni001` in a phenotypic TSV.
alyssadai marked this conversation as resolved.
Show resolved Hide resolved

- The subjects in the phenotypic TSV must be the same or a superset of subjects found in the BIDS directory. Neurobagel does not currently allow for datasets where subjects have BIDS data but are not represented in the phenotypic TSV.
alyssadai marked this conversation as resolved.
Show resolved Hide resolved

## Examples of valid phenotypic TSVs

Depending on your dataset, your tabular data may represent one or more of the following:
Depending on your dataset, your tabular data may look like one of the following:

### A BIDS `participants.tsv` file

Expand Down
Loading