A Snakemake workflow for pre-processing single plane illumination microscopy (SPIM, aka lightsheet microscopy).
Takes TIF images (tiled or prestitched) and outputs a validated BIDS Microscopy dataset, with a multi-channel multi-scale OME-Zarr file for each scan, along with downsampled nifti images (in a derivatives folder).
- Linux system with Singularity/Apptainer installed
- (Note: container will be automatically pulled when you run the workflow)
- Python >= 3.11
- Lightsheet data:
- Raw Ultramicroscope Blaze OME TIFF files (include
blaze
in the acquisition tag) - Prestitched TIFF files (include
prestitched
in the acquisition tag)
- Raw Ultramicroscope Blaze OME TIFF files (include
- Clone this repository to the folder you want to run the workflow in
git clone https://github.com/khanlab/spimprep
- Create and activate a virtual environment, then install dependencies with:
pip install .
Note: to make a venv on the CBS server use:
python3.11 -m venv venv
source venv/bin/activate
- Update the
config/datasets.tsv
spreadsheet to point to your dataset(s). Each dataset's tif files should be in it's own folder or tar file, with no other tif files. Enter the path to each dataset in thedataset_path
column. The first three columns identify the subject, sample, acquisition, which become part of the resulting filenames (BIDS naming). Thestain_0
andstain_1
identify what stains were used for each channel. Useautof
to indicate the autofluorescence channel. If you have a different number of stains you can add or remove these columns. If your samples have different numbers of stains, you can leave values blank or usen/a
to indicate that a sample does not have a particular stain.
Note: The acquisition value must contain either blaze
or prestitched
, and defines which workflow will be used. E.g. for LifeCanvas data that is already stitched, you need to include prestitched
in the acquisition flag.
New: Writing output directly to cloud storage is now supported; enable this by using s3://
or gcs://
in the root
variable, to point to a bucket you have write access to.
- The
config/config.yml
can be edited to customize any workflow parameters. The most important ones are theroot
andwork
variables. Theroot
path is where the results will end up, by default this is a subfolder calledbids
. Thework
path is where any intermediate scratch files are produced. By default the files inwork
are deleted after they are no longer needed in the workflow, unless you use the--notemp
command-line option. The workflow writes a large number of small files in parallel to thework
folder, so for optimum performance this should be a fast local disk, and not a networked file system (i.e. shared disk).
Note: you can use environment variables when specifying root
or work
, e.g. so work: '$SLURM_TMPDIR
can be used on HPC servers.
- Go to the SPIMprep folder and perform a dry-run to make sure the workflow is configured properly. This will only print what the workflow will run, and will not run anything.
snakemake -np
- To run the workflow, parallelizing on all cores, using Singularity (aka Apptainer) for dependencies, use:
snakemake -c all --sdm apptainer
or for snakemake<8.0, use:
snakemake -c all --use-singularity
Note: if you run the workflow on a system with large memory, you will need to set the heap size for the stitching and fusion rules. This can be done with e.g.: --set-resources bigstitcher:mem_mb=60000 fuse_dataset:mem_mb=100000
- If you want to run the workflow using a batch job submission server, please see the executor plugins here: https://snakemake.github.io/snakemake-plugin-catalog/
Alternate usage of this workflow (making use of conda) is described in the Snakemake Workflow Catalog.