Skip to content

Tutorials

Michael Innerberger edited this page Jul 11, 2024 · 17 revisions

To get started please follow the Installation instructions to install STIM either through Conda or by building it from source. There are two different examples based on the storage layout, a single slice one and one with multiple slices. Therefore, we first explain the basics of our storage layout.

For the tutorials, please download the example Visium data by clicking here and navigate to the folder where the data is stored. We assume you installed STIM using Conda and have the appropriate Conda environment active. If you compiled STIM from source, the executables may not be in your $PATH. In this case, call them with the full path (e.g., ./st-explorer if you installed them in the current directory). Note: your browser might automatically unzip the data, we cover both cases during the resaving step in the tutorials below.

Data layout

A spatial transcriptomics dataset can consist of a single 2-dimensional (2d) slice, or a container that contains several 2d slices and thereby forms a 3d volume. Note that for any 3d volume (container-dataset), each 2d slice can also be addressed as an individual dataset (slice-dataset). Most commands support both types of datasets, while some require a container (e.g. alignment).

Slice-datasets can either be saved in an anndata-conforming layout, where the expression values, locations and annotations are stored in /X, /obsm/spatial and /obs, respectively; or in a generic hierarchical layout, where the arrays are stored in /expressionValues, /locations and /annotations, respectively. The N5 API is used to read and write these layouts using the N5, Zarr, or HDF5 backend. If your slice(s) are stored in .csv files, you can use the st-resave command (see below) to resave your data into one of the supported formats by specifying the extension of the output as .h5 (generic HDF5), .n5 (generic N5), or .zarr (generic Zarr); an additional suffix ad is used to indicate the AnnData-conforming layout (e.g. h5ad for HDF5-backed AnnData).

For a slice-dataset, you can:

  • interactively view it using st-explorer (explore all genes & annotations) or st-bdv-view (view multiple genes in parallel)
  • render the dataset in ImageJ/Fiji and save the rendering, e.g., as TIFF, using st-render;
  • normalize the dataset using st-normalize;
  • add annotations such as, e.g., celltypes, using st-add-annotations;
  • create a container-dataset from one or more slice-datasets (see below).

For alignment of several slices, slices have to be grouped into an N5-container to allow additional annotations to be stored. In addition to all commands listed above for slice-datasets, the subsequent commands can be used for container-datasets:

  • create a container-dataset containing one or more existing slice-datasets using st-add-slice;
  • add a slice-dataset to a pre-existing container-dataset using st-add-slice;
  • perform pairwise alignment of slices using st-align-pairs (pre-processing);
  • visualize aligned pairs of slices using st-align-pairs-view (optional user verification);
  • perform global alignment of all slices using st-align-global (yielding the actual transformation for each slice-dataset);
  • visualize globally aligned data in BigDataViewer using st-bdv-view.

Tutorial: interactively exploring a single slice-dataset

  1. First, we need to convert the data we just downloaded as CSV into one of the supported formats for efficent storage and access to the dataset. We want the first slice of the data to be saved in an anndata file called slice1.h5ad. Assuming the data are in the downloaded visium.zip file in the same directory as the executables, execute the following:
st-resave -i visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad

This will automatically load the *.csv files from within the zipped file and create a slice1.h5ad file in the current directory (alternatively, you could extract the *.csv files as well and link them). The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip to the respective folder name, most likely visium.*

  1. Next, we will simply take a look at the slice-dataset directly:
st-explorer -i slice1.h5ad

First, type calm2 into the 'search gene' box. Using -c '0,110' we already set the display range to more or less match this dataset. You can manually change it by clicking in the BigDataViewer window and press s to bring up the brightness dialog. Feel free to play with the Visualization Options in the explorer, e.g. move Gauss Rendering to 0.5 to get a sharper image and then play with the Median Filter radius to filter the data.

  1. Example overlay of calm-2, mbpNow, we will create a TIFF image for gene Calm2 and Mbp:
st-render -i slice1.h5ad -g 'Calm2,Mbp' -rf 0.5

You can now for example overlay both images into a two-channel image using Image > Color > Merge Channels and select Calm2 as magenta and Mbp as green. You could for example convert this image to RGB Image > Type > RGB Color and then save it as TIFF, JPEG or AVI (e.g JPEG compression). These can be added to your presentation or paper for example, check out our beautiful AVI here (you need to click download on the right top). You could render a bigger image setting -s 0.1. Note: Please check the documentation of ImageJ and Fiji for help how to further process images.

Tutorial: aligning a multi-slice container-dataset

  1. Make sure you followed the previous tutorial such that you've already resaved the first slice of the visium dataset as anndata file slice1.h5ad.

  2. In order to perform the alignment of the whole dataset (would work identically for more than two slices), we need to create a container-dataset containing the already resaved slice-dataset:

st-add-slice -c visium.n5 -i slice1.h5ad

This will create an N5 container visium.n5 and link the first slice to it. If you don't want the slice to be linked but moved instead, you can use the -m flag. Also, custom storage locations for the location, expression values, and annotations arrays within the slice can be given by -l, -e, and -a, respectively.

  1. Now we resave the second slice of the data as N5 slice-dataset. Assuming the data are in the downloaded visium.zip file in the same directory as the executables:
st-resave \
   -i visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5 \
   -c visium.n5

It will automatically load the *.csv files from within the zipped file and add it to the visium.n5 container-dataset already containing the first slice. The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip to the respective folder name, most likely visium.

  1. Next, we can again take a look at the data, which now includes two slice-datasets. We can do this interactively
st-explorer -i visium.n5

by rendering images for all desired genes

st-render -i visium.n5 -g 'Calm2,Mbp' -rf 0.5

or by looking at one of the datasets in the container

st-bdv-view -i visium.n5 -g 'Calm2,Mbp' -rf 0.5 -d slice1.h5ad
st-bdv-view -i visium.n5 -g 'Calm2,Mbp' -rf 0.5 -d slice2.n5

Selecting genes and adjusting visualization options work exactly as in the first tutorial. Example overlay of calm-2, mbpWe can now overlay both images into a two-channel image again using Image > Color > Merge Channels and select Calm2 as magenta and Mbp as green. By flipping through the slices (slice1 and slice2) you will realize that they are not aligned.

  1. To remedy this, we will perform alignment of the two slices. We will use 15 automatically selected genes -n, a maximum error of 100 --maxEpsilon (in units of the sequenced locations) and require at least 30 inliers per gene --minNumInliersGene (this dataset is more robust than the SlideSeq one). The alignment process takes around 1-2 minutes on a modern notebook. Note: at this point no transformations are stored within the container-dataset, but only the list of corresponding points between all pairs of slices.
st-align-pairs -c visium.n5 -n 15 -rf 0.5 --maxEpsilon 100 --minNumInliersGene 30

For your dataset, the optimal choice of parameters may vary. A good baseline for the --maxEpsilon parameter is ten times the average distance between the sequenced points. If the --maxEpsilon option is not given, this value is computed and used automatically. For the number of selected genes -n, higher values yield better results but then alignment is slower. Increasing the minimal number of inliers per gene --minNumInliersGene can also increase alignment quality, but can lead to the alignment to fail.

The st-align-pairs command will precompute and store the standard deviation values as gene annotations in the container. You can compute these values separately with the tool st-add-entropy, as:

st-add-entropy -i visium.n5/slice1.h5ad
st-add-entropy -i visium.n5/slice2.n5
# then, compute the pairwise alignment
st-align-pairs -c visium.n5 -n 15 -rf 0.5 --maxEpsilon 100 --minNumInliersGene 30 --entropyPath "stdev"
  1. Example alignment Now we will visualize before/after alignment of this pair of slices. To this end, we create two independent images, one using st-render (see above) and one using st-align-pairs-view on the automatically selected gene mt-Nd4. st-render will display the slices unaligned, while st-align-pairs-view will show them aligned.
st-render -i visium.n5 -rf 0.5 -g mt-Nd4
st-align-pairs-view -c visium.n5 -rf 0.5 -g mt-Nd4

Note: to create the GIF shown I saved both images independently, opened them in Fiji, cropped them, combined them, converted them to 8-bit color, set framerate to 1 fps, and saved it as one GIF.

  1. Finally, we perform the global alignment. In this particular case, it is identical to the pairwise alignment process as we only have two slices. However, we still need to do it so the final transformations for the slices are stored in the slice-datasets. After that, st-explorer, st-bdv-view and st-render will take these transformations into account when displaying the data. This final processing step usually only takes a few seconds.
st-align-global -c visium.n5 --absoluteThreshold 100 -rf 0.5 --lambda 0.0 --skipICP
  1. Example alignmentThe final dataset can for example be visualized and interactively explored using BigDataViewer. Therefore, we specify three genes -g Calm2,Mbp,mt-Nd4, a crisper rendering -rf 0.5, and a relative z-spacing between the two planes that shows them close to each other -z 2. Of course, the same data can be visualized using st-explorer and st-render, and visualization options such as color or contrast per gene can be adjusted manually. This will display all sections in the container:
st-bdv-view3d -i visium.n5 -g Calm2,Mbp,mt-Nd4 -rf 0.5 -z 2

It is also possible to visualize a single slice, with interactive controls for rendering strategy, render factor, and filters:

st-bdv-view -i visium.n5 -d slice1.h5ad -g Calm2,Mbp,mt-Nd4 -rf 0.5

We encourage you to use this small two slice dataset as a starting point for playing with and extending STIM. If you have any questions, feature requests or concerns please open an issue here on GitHub. Thanks so much!

Tutorial: interactive alignment with GUI

You can align spatial transcriptomics data interactively using st-align-interactive, which provides a GUI based on BigDataViewer. Here you'll learn more about the GUI, how to navigate data, and how to perform alignment manually, with SIFT, or with ICP.

First, launch the interactive alignment tool:

st-align-interactive -c visium.n5 -d1 slice1.h5ad -d2 slice2.n5 -n 10 -rf 1.5

This will load the datasets slice1.h5ad and slice2.n5 from the visium.n5 container, then computes the standard deviation, stores it in the datasets (so it is not recomputed later), and selects the -n 10 genes with highest standard deviation for plotting. Upon loading, you will see the pair of ST data rendered in one color per section. Initially, the two sections are shown in their unaligned coordinates, and the first of the 10 automatically selected genes is used for rendering.

GUI overview The GUI consists of two main parts: the viewport (A), and the sidebar (B). The viewport is a standard BigDataViewer, where you can zoom, translate and rotate the data. In the sidebar, you will find the following cards (more on cards 6-8 below):

  1. Display Modes: how the data is visualized (e.g., single/fused, type of interpolation...)
  2. Sources: e.g., change colors for the sources
  3. Groups: to select what is displayed in BDV
  4. STIM Display Options: some rendering settings (factor, brightness limits)
  5. STIM Filtering Options: to apply on-the-fly filters to the rendered image (Gaussian, Median...)
  6. Manual Alignment: one can perform pairwise alignment by dragging and scrolling with the mouse over the viewport
  7. SIFT Alignment: automatic interactive alignment using SIFT/RANSAC
  8. ICP Alignment: once SIFT alignment is performed, an additional round of ICP refinement is possible.

Manual alignment, before You can change the gene used for visualization under Groups, by pressing the radio button (circular) for the gene of interest.

You can add more genes to BDV under the STIM Display Options card, and pressing Genes(+). Then, you will see a window where you can select any gene in the data (tip: you can search for it using the textbox on the upper part). Then clicking on the gene, and pressing the Add & Close button.

If you are not familiar with the controls in BigDataViewer, here are some basic navigation instructions that you can follow to zoom, drag, or rotate the displayed data.

Interactive manual alignment

Manual alignment, before To perform manual alignment, go to Manual Alignment and press Start. Then, you can scale, translate and rotate one section (moving) respect to another (fixed) by using the mouse. Refer to the basic navigation instructions for BigDataViewer. In general, you can:

  1. scale with the mouse wheel
  2. rotate with left drag anywhere on the canvas
  3. translate with right or middle drag

Manual alignment, afterThe transformation matrix, displayed above the Reset/Cancel/Start buttons, is updated dynamically as you rotate, translate or scale the moving image. Once you are done, you can press Finish to keep the transformation. This will be used when rendering any other gene (thus, all will be aligned upon display).

Interactive alignment with SIFT

SIFT alignment, before To perform alignment with SIFT, go to the SIFT Alignment card. We provide several presets for SIFT matching (from fast to very thorough), and more advanced SIFT options that can be used to tweak these presets and improve the automatic alignment. These parameters are briefly described in the GUI, common to those in st-align-pairs. One possible use-case of this card is to interactively explore and have a good intuition of parameters that are most suitable for the data at hand, i.e., before proceeding with pairwise alignments of all the sections (when >2 are available). Once you have chosen some parameters, click Run SIFT alignment - a progress bar will be updated in real time.

Manual alignment, after (changing genes, a GIF) Once SIFT alignment has finished, the alignment is previewed along with all the matches per gene. You can navigate all genes by going back to the Groups card, and selecting the gene of interest as described above. When adding new genes, these will appear aligned, as they will be automatically transformed using the estimated model. It is possible to store the transformation to the container by clicking on the save button (floppy disk icon), or reset the transformation.

Interactive alignment with ICP

Optionally, it is possible to refine the result from SIFT alignment using ICP. Navigate to the ICP Alignment card, adjust the parameters (similarly to SIFT alignment), and click Run ICP alignment. It is possible to store the transformation to the container by clicking on the save button (floppy disk icon), or reset the transformation.

Tutorial: aligning from Python (notebooks)

As an alternative to the command-line interface, we also provide the stimwrap Python package, which provides an API to call STIM programs from Python, e.g., via Jupyter Notebooks. This, together with the support for AnnData-backed n5 containers, allows seamless integration of STIM into Python-based workflows (or, more specifically, scverse-based workflows).

In practice, this means that you can have a single notebook where you can run preliminary data QC, 3D alignment, and any other downstream analysis such as cell typing, neighborhood analysis, differential expression, among others - without needing to convert data formats or use different languages.

In a nutshell, you can install the stimwrap package via pip:

pip install stimwrap

From python, the workflow above (for aligning a multi-slice dataset) can be replicated as:

import stimwrap as st

# convert visium expression matrix to AnnData and n5 (we can mix&match!)
st.resave(input="visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad", container="visium.n5")
st.resave(input="visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5", container="visium.n5")

# pairwise and global alignment
st.align_pairs(container="visium.n5", num_genes=15, rendering_factor=0.5, max_epsilon=100, min_num_inliers_gene=30)
st.align_global(container="visium.n5", absolute_threshold=100, rendering_factor=0.5, lmbda=0.0, skip_icp=True)

# you can visualize the results using BDV or any other function from STIM
# you will need to run Python from a session with a window server (e.g., local or remote X11)
st.bdv_view3d(input="visium.n5", genes=['Calm2,Mbp,mt-Nd4'], rendering_factor=0.5, z_spacing_factor=2)

Above, we saved one slice as h5ad and another as n5. Storing both as h5ad would ensure that we do not need to convert the data more than once to perform later downstream analysis, e.g., with scverse tools.

You can refer to the documentation or the example notebooks for more information about the general workflow. For instance, we provide Jupyter Notebooks for the 3D registration of Open-ST data, plus some additional cases of downstream analysis, to showcase the interoperability of STIM with the Python ecosystem.