The goal of this project is to build a semi-automated sc-RNA-seq analysis workflow in the cloud. Tabula Muris Senis will be used as the reference database for the annotations.
Per the the Chan Zuckerberg Biohub website, “Tabula Muris is a compendium of single cell transcriptome data from the model organism Mus musculus, containing nearly 100,000 cells from 20 organs and tissues. The data allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as immune cells from distinct anatomical locations.” More information on Tabula Muris Senis can be found here.
There are three related python projects here:
- In
webapp
there is simple flask app, that uses the docker containers defined in context_processing
- and
context_annotations
.
To download sample data:
./download-data.sh
To run the flask app:
cd webapp
pip install -r requirements.txt
./start.sh
The app uses images we have pushed to dockerhub.
To rebuild the image locally and run it with the samples in data/
:
./build-and-run-image.sh
This was begun at the Single Cell Hackathon, NYGC, January 15-17, 2020. It can run in a local development environment, but it's a long ways from being something that could be deployed in the cloud. We've created issues for some of the next steps.
- Input gene counts and metadata .h5ad
- Preprocessing
- Process data using Scanpy
- Minimum number of reads
- Minimum number of genes
- Minimum number of cells
- Visualization
- Utilizing CZ Biohub cellxgene tool - https://tabula-muris-senis.ds.czbiohub.org/all/scVI-UMAP/
- Annotations
- Label Propagation
- SCVI & OnClass