HRA workflows for annotating h5ad using different tools.
- cwl-runner
- Docker
Docker images can be build locally by running ./scripts/build-containers.sh
. By default the script will build all containers when run. To build individual images or a set of images provide the container names as arguments to the build script, ex. ./scripts/build-containers.sh azimuth gene-expression
.
Download model data by running cwl-runner download-models.cwl
.
The first step is to create a job file that will specify inputs to the pipeline. The file can be written as either a json or yaml file.
matrix:
class: File
path: path/to/data.h5ad
organ: UBERON:0002048 # Uberon id for lung
algorithms:
# Algorithm specific options are documented in the container's options.yml
- azimuth:
referenceDataDir:
class: Directory
path: path/to/models/directory
After creating a job file running the annotation tools is as simple as running cwl-runner pipeline.cwl my-job.yml
(replace my-job.yml
with your job file).
An annotation tool generally has the following file structure:
containers/
my-annotation-tool/
Dockerfile
options.yml
pipeline.cwl
download-data.cwl (optional)
context/*
code and assets...
Where each file should perform the following function:
Dockerfile
- Instructions for building a docker image.
options.yml
- Cwl definition of tool specific options.
pipeline.cwl
- Main cwl pipeline for running the tool.
- 3 inputs: "matrix", "organ", and "options"
- 3 outputs: "annotations", "annotated_matrix", and "report".
download-data.cwl
(optional)- Download models and other data required for running the tool.
- Implement this pipeline when the model data is to large to embed directly in the docker image.
context/*
- Directory containing the code and assets implementing the tool.
After implementing a new algorithm a few changes have to be made to enable the tool from the main pipeline. The files that have to be updated are: pipeline.cwl
, ./steps/annotate.cwl
, and ./steps/run-one.cwl
. After adding the new tool to the top level pipeline it can be used by specifying the tool in a job file.
matrix:
class: File
path: path/to/data.h5ad
organ: UBERON:0002048 # Uberon id for lung
algorithms:
- my-annotation-tool:
# Options specific to my-annotation-tool
option1: value1
option2: value2
...