Skip to content

A Snakemake workflow for pre-processing single plane illumination microscopy (SPIM, aka lightsheet microscopy)

License

Notifications You must be signed in to change notification settings

khanlab/SPIMprep

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0457384 · Sep 14, 2024

History

70 Commits
Jul 21, 2024
Feb 13, 2024
Feb 20, 2024
Sep 14, 2024
May 3, 2024
Aug 27, 2024
Feb 14, 2024
Jul 17, 2024
Sep 14, 2024
Feb 13, 2024
Feb 13, 2024
Feb 13, 2024
Jul 21, 2024
Apr 29, 2024
Apr 29, 2024
May 3, 2024
Sep 10, 2024
Sep 10, 2024
Sep 12, 2024

Repository files navigation

Snakemake workflow: SPIMprep

Snakemake GitHub actions status

A Snakemake workflow for pre-processing single plane illumination microscopy (SPIM, aka lightsheet microscopy).

Takes TIF images (tiled or prestitched) and outputs a validated BIDS Microscopy dataset, with a multi-channel multi-scale OME-Zarr file for each scan, along with downsampled nifti images (in a derivatives folder).

Requirements

  • Linux system with Singularity/Apptainer installed
    • (Note: container will be automatically pulled when you run the workflow)
  • Python >= 3.11
  • Lightsheet data:
    • Raw Ultramicroscope Blaze OME TIFF files (include blaze in the acquisition tag)
    • Prestitched TIFF files (include prestitched in the acquisition tag)

Usage

  1. Clone this repository to the folder you want to run the workflow in
git clone https://github.com/khanlab/spimprep
  1. Create and activate a virtual environment, then install dependencies with:
pip install .

Note: to make a venv on the CBS server use:

python3.11 -m venv venv
source venv/bin/activate
  1. Update the config/datasets.tsv spreadsheet to point to your dataset(s). Each dataset's tif files should be in it's own folder or tar file, with no other tif files. Enter the path to each dataset in the dataset_path column. The first three columns identify the subject, sample, acquisition, which become part of the resulting filenames (BIDS naming). The stain_0 and stain_1 identify what stains were used for each channel. Use autof to indicate the autofluorescence channel. If you have a different number of stains you can add or remove these columns. If your samples have different numbers of stains, you can leave values blank or use n/a to indicate that a sample does not have a particular stain.

Note: The acquisition value must contain either blaze or prestitched, and defines which workflow will be used. E.g. for LifeCanvas data that is already stitched, you need to include prestitched in the acquisition flag.

New: Writing output directly to cloud storage is now supported; enable this by using s3:// or gcs:// in the root variable, to point to a bucket you have write access to.

  1. The config/config.yml can be edited to customize any workflow parameters. The most important ones are the root and work variables. The root path is where the results will end up, by default this is a subfolder called bids. The work path is where any intermediate scratch files are produced. By default the files in work are deleted after they are no longer needed in the workflow, unless you use the --notemp command-line option. The workflow writes a large number of small files in parallel to the work folder, so for optimum performance this should be a fast local disk, and not a networked file system (i.e. shared disk).

Note: you can use environment variables when specifying root or work, e.g. so work: '$SLURM_TMPDIR can be used on HPC servers.

  1. Go to the SPIMprep folder and perform a dry-run to make sure the workflow is configured properly. This will only print what the workflow will run, and will not run anything.
snakemake -np 
  1. To run the workflow, parallelizing on all cores, using Singularity (aka Apptainer) for dependencies, use:
snakemake -c all --sdm apptainer 

or for snakemake<8.0, use:

snakemake -c all --use-singularity 

Note: if you run the workflow on a system with large memory, you will need to set the heap size for the stitching and fusion rules. This can be done with e.g.: --set-resources bigstitcher:mem_mb=60000 fuse_dataset:mem_mb=100000

  1. If you want to run the workflow using a batch job submission server, please see the executor plugins here: https://snakemake.github.io/snakemake-plugin-catalog/

Alternate usage of this workflow (making use of conda) is described in the Snakemake Workflow Catalog.