Tatsuya Tsukahara¹*, David H. Brann¹*, Stan L. Pashkovski¹, Grigori Guitchounts¹, Thomas Bozza² and Sandeep Robert Datta¹
¹Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
²Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
*These authors contributed equally
For more details, please see our Open Access manuscript here.
- Make a new conda env, e.g.
conda create -n osn python=3.8
- Activate that env
conda activate osn
. - Clone and enter this repo:
git clone [email protected]:dattalab/Tsukahara_Brann_OSN.git && cd Tsukahara_Brann_OSN
- To install the specific versions of packages used in this repo do
pip install -r requirements.txt
. The minimal requirements for running the scripts and notebooks in this repo are topip install scanpy pysam numpy_groupies cmocean notebook
. - Install the code in this directory from the
setup.py
file viapip install -e .
- Data is available on the NCBI GEO at Accession number GSE173947.
- Raw count files are provided for the mature OSNs for the 152 replicates used, grouped by experiment as described in GSE173947_Dataset_raw_file_names.csv, which can also be found on the GEO.
- Download the gene expression matrix and corresponding metadata (e.g. GSE173947_home_cage_umi_counts.csv.gz and GSE173947_home_cage_metadata.csv.gz for the home-cage dataset) to the data/raw folder. The examples in this repo use the
home_cage
,ChronicOccl
,ActSeq
,ActSeq_conc_analog
, andenv_switch
datasets. Therefore, to run these notebooks, the raw counts and metadata for these five experiments should be added to the data/raw folder. - Alternatively, run
python scripts/download_geo.py
to download the supplementary files from the GEO. - The code and additional data files (such as extracted glomerular calcium traces) are hosted on Zenodo:
- The raw fastq files (generated from the 10x bam files) can be found on the SRA (accession SRP318630). These were further processed, as described in the methods, to remove ambiguously-mapped UMIs, using the run_dedup.sh script.
Code to replicate analyses in Tsukahara, Brann et al. 2021 Cell.
- Open a new jupyter notebook with
jupyter notebook
. - Run the notebooks. The first notebook (00_make_adata_from_raw_counts.ipynb) converts the raw count matrices into AnnData objects used for downstream analyses. The subsequent notebooks show how to apply the cNMF gene loadings to get gene expression program (GEP) usages, how to identify activated ORs and calculate the activation score in the Act-seq experiments, and how to work with the data from the other experimental manipulations (e.g. environment switches).
- Run any of the stand-alone scripts. These perform classification of OR identity from the scRNA-seq data, as well classification of environment and odors from the imaging data. Some of these require more computing resources. The results shown in the paper are typically after 1,000 restarts, though the defaults in this repo use fewer.
For more details, please consult the methods in our manuscript, post an issue here, or contact the authors.