From 3411a0e22f1bb8baa083819f409c87b28a2403da Mon Sep 17 00:00:00 2001 From: Xi Chen Date: Mon, 4 Mar 2024 00:59:29 +0800 Subject: [PATCH] cleaned files in data --- README.md | 3 +- .../41586_2019_969_MOESM3_ESM.xlsx | Bin .../Star_CB_UMI_Complex_sci-RNA-seq3.jpg | Bin .../sci-RNA-seq3_hp.txt | 0 .../sci-RNA-seq3_p5.txt | 0 .../sci-RNA-seq3_p7.txt | 0 .../sci-RNA-seq3_rt.txt | 0 ...ell_homodimer.svg => tn5_s7_homodimer.svg} | 0 docs/source/ge/pip-seqv2.md | 377 ------------------ docs/source/ge/sci-RNA-seq.md | 24 +- docs/source/ge/sci-RNA-seq3.md | 22 +- methods_html/Microwell-seq.html | 2 +- methods_html/Paired-seq.html | 2 +- methods_html/SHARE-seq.html | 2 +- methods_html/SureCell.html | 2 +- methods_html/itChIP-seq.html | 2 +- methods_html/sci-RNA-seq.html | 156 -------- methods_html/sci-RNA-seq3.html | 162 -------- methods_html/sci-RNA-seq_family.html | 311 +++++++++++++++ methods_html/scifi-RNA-seq.html | 2 +- 20 files changed, 341 insertions(+), 726 deletions(-) rename data/{ => sci-RNA-seq_family}/41586_2019_969_MOESM3_ESM.xlsx (100%) rename data/{ => sci-RNA-seq_family}/Star_CB_UMI_Complex_sci-RNA-seq3.jpg (100%) rename data/{ => sci-RNA-seq_family}/sci-RNA-seq3_hp.txt (100%) rename data/{ => sci-RNA-seq_family}/sci-RNA-seq3_p5.txt (100%) rename data/{ => sci-RNA-seq_family}/sci-RNA-seq3_p7.txt (100%) rename data/{ => sci-RNA-seq_family}/sci-RNA-seq3_rt.txt (100%) rename data/{tn5_surecell_homodimer.svg => tn5_s7_homodimer.svg} (100%) delete mode 100644 docs/source/ge/pip-seqv2.md delete mode 100644 methods_html/sci-RNA-seq.html delete mode 100644 methods_html/sci-RNA-seq3.html create mode 100644 methods_html/sci-RNA-seq_family.html diff --git a/README.md b/README.md index 1e2f635..1422a78 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Click the following links to view the methods. Notes: - [SMART-seq family (including SMART-seq, SMART-seq2/3/3xpress and FLASH-seq)](https://teichlab.github.io/scg_lib_structs/methods_html/SMART-seq_family.html) - [STRT-seq family (including STRT-seq, STRT-seq-C1 and STRT-seq-2i)](https://teichlab.github.io/scg_lib_structs/methods_html/STRT-seq_family.html) + - [sci-RNA-seq family (including sci-RNA-seq and sci-RNA-seq3)](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) - [Quartz-seq family (including Quartz-seq and Quartz-seq2)](https://teichlab.github.io/scg_lib_structs/methods_html/Quartz-seq_family.html) - [CEL-seq family (including CEL-seq and CEL-seq2)](https://teichlab.github.io/scg_lib_structs/methods_html/CEL-seq_family.html) - [10x Chromium Single Cell 3' V3 FeatureBarcoding](https://teichlab.github.io/scg_lib_structs/methods_html/10xChromium3fb.html) @@ -33,8 +34,6 @@ Click the following links to view the methods. Notes: - [scifi-RNA-seq](https://teichlab.github.io/scg_lib_structs/methods_html/scifi-RNA-seq.html) - [Microwell-seq](https://teichlab.github.io/scg_lib_structs/methods_html/Microwell-seq.html) - [BD Rhapsody](https://teichlab.github.io/scg_lib_structs/methods_html/BD_Rhapsody.html) - - [sci-RNA-seq3](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) - - [sci-RNA-seq](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html) - [HyDrop-RNA](https://teichlab.github.io/scg_lib_structs/methods_html/HyDrop_RNA.html) - [Seq-Well S3](https://teichlab.github.io/scg_lib_structs/methods_html/SeqWell_S3.html) - [Tang 2009](https://teichlab.github.io/scg_lib_structs/methods_html/tang2009.html) diff --git a/data/41586_2019_969_MOESM3_ESM.xlsx b/data/sci-RNA-seq_family/41586_2019_969_MOESM3_ESM.xlsx similarity index 100% rename from data/41586_2019_969_MOESM3_ESM.xlsx rename to data/sci-RNA-seq_family/41586_2019_969_MOESM3_ESM.xlsx diff --git a/data/Star_CB_UMI_Complex_sci-RNA-seq3.jpg b/data/sci-RNA-seq_family/Star_CB_UMI_Complex_sci-RNA-seq3.jpg similarity index 100% rename from data/Star_CB_UMI_Complex_sci-RNA-seq3.jpg rename to data/sci-RNA-seq_family/Star_CB_UMI_Complex_sci-RNA-seq3.jpg diff --git a/data/sci-RNA-seq3_hp.txt b/data/sci-RNA-seq_family/sci-RNA-seq3_hp.txt similarity index 100% rename from data/sci-RNA-seq3_hp.txt rename to data/sci-RNA-seq_family/sci-RNA-seq3_hp.txt diff --git a/data/sci-RNA-seq3_p5.txt b/data/sci-RNA-seq_family/sci-RNA-seq3_p5.txt similarity index 100% rename from data/sci-RNA-seq3_p5.txt rename to data/sci-RNA-seq_family/sci-RNA-seq3_p5.txt diff --git a/data/sci-RNA-seq3_p7.txt b/data/sci-RNA-seq_family/sci-RNA-seq3_p7.txt similarity index 100% rename from data/sci-RNA-seq3_p7.txt rename to data/sci-RNA-seq_family/sci-RNA-seq3_p7.txt diff --git a/data/sci-RNA-seq3_rt.txt b/data/sci-RNA-seq_family/sci-RNA-seq3_rt.txt similarity index 100% rename from data/sci-RNA-seq3_rt.txt rename to data/sci-RNA-seq_family/sci-RNA-seq3_rt.txt diff --git a/data/tn5_surecell_homodimer.svg b/data/tn5_s7_homodimer.svg similarity index 100% rename from data/tn5_surecell_homodimer.svg rename to data/tn5_s7_homodimer.svg diff --git a/docs/source/ge/pip-seqv2.md b/docs/source/ge/pip-seqv2.md deleted file mode 100644 index 9d942fe..0000000 --- a/docs/source/ge/pip-seqv2.md +++ /dev/null @@ -1,377 +0,0 @@ -# PIP-seq V2 - -Check [this GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/PIP-seq.html) to see how __PIP-seq V2__ libraries are generated experimentally. This is a "droplet-free" droplet single-cell RNA-seq method. The method was based on a previous developed technology called [__particle-templated emulsification__](https://pubs.acs.org/doi/10.1021/acs.analchem.8b01759). In this system, barcoded hydrogel beads, single cell suspension, RT reagents and oil are put into a single tube. The encapsulation of single cells and beads into monodispersed emulsions (droplets) is achieved by simple vortexing on a bench top vortexer. Crazy, right? It is for real. - -The commercial version of this technology is developed by [FluentBio Inc.](https://www.fluentbio.com), and it is renamed to "PIPseq" without the hyphen. As all single-cell methods, the protocols are being frequently updated. At this time of writing (**02-Feb-2024**), the latest version is **PIPseq V4**. In this page, we are just documenting how to preprocess data from the **V2** chemistry, which has become obsolete. In the **V2** chemistry, the bead barcode is generated using a split-pool based strategy with three rounds of barcodes. - -## For Your Own Experiments - -The read configuration is the same as a standard library: - -| Order | Read | Cycle | Description | -|-------|------------------|----------|-----------------------------------------------------------------------------| -| 1 | Read 1 | >50 | This yields `R1_001.fastq.gz`, cell barcode and UMI | -| 2 | Index 1 (__i7__) | 8 | This yields `I1_001.fastq.gz`, sample/library index | -| 3 | Index 2 (__i5__) | Optional | This yields `I2_001.fastq.gz`, not really used in **V2** but can be present | -| 4 | Read 2 | >50 | This yields `R2_001.fastq.gz`, cDNA reads | - -The content of __Read 1__ is like this: - -| Length | Sequence (5' -> 3') | -|--------|-----------------------------------------------------------------------------------------------------------| -| >50 | 8 bp __Barcode 1__ + ATGCATC + 8 bp __Barcode 2__ + CCTCGAG + 8 bp __Barcode 3__ + 12 bp __UMI__ + poly T | - -You can think of the 10 bp __RT barcode__ as the well barcode for the 1st plate, the __hairpin barcode__ as the well barcode for the 2nd plate and __i7 + i5__ are the well barcode for the 3rd plate. For a cell, it can go into a well in the 1st plate, then another well in the 2nd plate and finally a well in the 3rd plate. Different cells have very low chance of going through the same combination of wells in the three plates. Therefore, if reads have the same combination of well barcodes (__RT barcode + hairpin barcode + i7 + i5__), we can safely think they are from the same cell. - -If you sequence the library via your core facility or a company, you need to provide the `i5` and `i7` index sequence you used during the library PCR. Like mentioned previously, they are basically the well barcode for the 3rd plate. Then you will get two `fastq` files (`R1` and `R2`) per well. The total file number will depend on how many wells in the 3rd plate you are processing. - -If you sequence the library on your own, you need to get the `fastq` files by running `bcl2fastq` by yourself. In this case it is better to write a `SampleSheet.csv` with `i7` and `i5` indices for each well in the 3rd plate. This will yield the `fastq` files similar to those from your core facility or the company. Here is an example of the `SampleSheet.csv` from a NextSeq run with a full 96-well plate (3rd plate) using some standard Nextera indices: - -```text -[Header],,,,,,,,,,, -IEMFileVersion,5,,,,,,,,,, -Date,17/12/2019,,,,,,,,,, -Workflow,GenerateFASTQ,,,,,,,,,, -Application,NextSeq FASTQ Only,,,,,,,,,, -Instrument Type,NextSeq/MiniSeq,,,,,,,,,, -Assay,AmpliSeq Library PLUS for Illumina,,,,,,,,,, -Index Adapters,AmpliSeq CD Indexes (384),,,,,,,,,, -Chemistry,Amplicon,,,,,,,,,, -,,,,,,,,,,, -[Reads],,,,,,,,,,, -34,,,,,,,,,,, -52,,,,,,,,,,, -,,,,,,,,,,, -[Settings],,,,,,,,,,, -,,,,,,,,,,, -[Data],,,,,,,,,,, -Sample_ID,Sample_Name,Sample_Plate,Sample_Well,Index_Plate,Index_Plate_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description -A1,,,,,,N701,TAAGGCGA,S502,ATAGAGAG,, -A2,,,,,,N702,CGTACTAG,S502,ATAGAGAG,, -A3,,,,,,N703,AGGCAGAA,S502,ATAGAGAG,, -A4,,,,,,N704,TCCTGAGC,S502,ATAGAGAG,, -A5,,,,,,N705,GGACTCCT,S502,ATAGAGAG,, -A6,,,,,,N706,TAGGCATG,S502,ATAGAGAG,, -A7,,,,,,N707,CTCTCTAC,S502,ATAGAGAG,, -A8,,,,,,N710,CGAGGCTG,S502,ATAGAGAG,, -A9,,,,,,N711,AAGAGGCA,S502,ATAGAGAG,, -A10,,,,,,N712,GTAGAGGA,S502,ATAGAGAG,, -A11,,,,,,N714,GCTCATGA,S502,ATAGAGAG,, -A12,,,,,,N715,ATCTCAGG,S502,ATAGAGAG,, -B1,,,,,,N701,TAAGGCGA,S503,AGAGGATA,, -B2,,,,,,N702,CGTACTAG,S503,AGAGGATA,, -B3,,,,,,N703,AGGCAGAA,S503,AGAGGATA,, -B4,,,,,,N704,TCCTGAGC,S503,AGAGGATA,, -B5,,,,,,N705,GGACTCCT,S503,AGAGGATA,, -B6,,,,,,N706,TAGGCATG,S503,AGAGGATA,, -B7,,,,,,N707,CTCTCTAC,S503,AGAGGATA,, -B8,,,,,,N710,CGAGGCTG,S503,AGAGGATA,, -B9,,,,,,N711,AAGAGGCA,S503,AGAGGATA,, -B10,,,,,,N712,GTAGAGGA,S503,AGAGGATA,, -B11,,,,,,N714,GCTCATGA,S503,AGAGGATA,, -B12,,,,,,N715,ATCTCAGG,S503,AGAGGATA,, -C1,,,,,,N701,TAAGGCGA,S505,CTCCTTAC,, -C2,,,,,,N702,CGTACTAG,S505,CTCCTTAC,, -C3,,,,,,N703,AGGCAGAA,S505,CTCCTTAC,, -C4,,,,,,N704,TCCTGAGC,S505,CTCCTTAC,, -C5,,,,,,N705,GGACTCCT,S505,CTCCTTAC,, -C6,,,,,,N706,TAGGCATG,S505,CTCCTTAC,, -C7,,,,,,N707,CTCTCTAC,S505,CTCCTTAC,, -C8,,,,,,N710,CGAGGCTG,S505,CTCCTTAC,, -C9,,,,,,N711,AAGAGGCA,S505,CTCCTTAC,, -C10,,,,,,N712,GTAGAGGA,S505,CTCCTTAC,, -C11,,,,,,N714,GCTCATGA,S505,CTCCTTAC,, -C12,,,,,,N715,ATCTCAGG,S505,CTCCTTAC,, -D1,,,,,,N701,TAAGGCGA,S506,TATGCAGT,, -D2,,,,,,N702,CGTACTAG,S506,TATGCAGT,, -D3,,,,,,N703,AGGCAGAA,S506,TATGCAGT,, -D4,,,,,,N704,TCCTGAGC,S506,TATGCAGT,, -D5,,,,,,N705,GGACTCCT,S506,TATGCAGT,, -D6,,,,,,N706,TAGGCATG,S506,TATGCAGT,, -D7,,,,,,N707,CTCTCTAC,S506,TATGCAGT,, -D8,,,,,,N710,CGAGGCTG,S506,TATGCAGT,, -D9,,,,,,N711,AAGAGGCA,S506,TATGCAGT,, -D10,,,,,,N712,GTAGAGGA,S506,TATGCAGT,, -D11,,,,,,N714,GCTCATGA,S506,TATGCAGT,, -D12,,,,,,N715,ATCTCAGG,S506,TATGCAGT,, -E1,,,,,,N701,TAAGGCGA,S507,TACTCCTT,, -E2,,,,,,N702,CGTACTAG,S507,TACTCCTT,, -E3,,,,,,N703,AGGCAGAA,S507,TACTCCTT,, -E4,,,,,,N704,TCCTGAGC,S507,TACTCCTT,, -E5,,,,,,N705,GGACTCCT,S507,TACTCCTT,, -E6,,,,,,N706,TAGGCATG,S507,TACTCCTT,, -E7,,,,,,N707,CTCTCTAC,S507,TACTCCTT,, -E8,,,,,,N710,CGAGGCTG,S507,TACTCCTT,, -E9,,,,,,N711,AAGAGGCA,S507,TACTCCTT,, -E10,,,,,,N712,GTAGAGGA,S507,TACTCCTT,, -E11,,,,,,N714,GCTCATGA,S507,TACTCCTT,, -E12,,,,,,N715,ATCTCAGG,S507,TACTCCTT,, -F1,,,,,,N701,TAAGGCGA,S508,AGGCTTAG,, -F2,,,,,,N702,CGTACTAG,S508,AGGCTTAG,, -F3,,,,,,N703,AGGCAGAA,S508,AGGCTTAG,, -F4,,,,,,N704,TCCTGAGC,S508,AGGCTTAG,, -F5,,,,,,N705,GGACTCCT,S508,AGGCTTAG,, -F6,,,,,,N706,TAGGCATG,S508,AGGCTTAG,, -F7,,,,,,N707,CTCTCTAC,S508,AGGCTTAG,, -F8,,,,,,N710,CGAGGCTG,S508,AGGCTTAG,, -F9,,,,,,N711,AAGAGGCA,S508,AGGCTTAG,, -F10,,,,,,N712,GTAGAGGA,S508,AGGCTTAG,, -F11,,,,,,N714,GCTCATGA,S508,AGGCTTAG,, -F12,,,,,,N715,ATCTCAGG,S508,AGGCTTAG,, -G1,,,,,,N701,TAAGGCGA,S510,ATTAGACG,, -G2,,,,,,N702,CGTACTAG,S510,ATTAGACG,, -G3,,,,,,N703,AGGCAGAA,S510,ATTAGACG,, -G4,,,,,,N704,TCCTGAGC,S510,ATTAGACG,, -G5,,,,,,N705,GGACTCCT,S510,ATTAGACG,, -G6,,,,,,N706,TAGGCATG,S510,ATTAGACG,, -G7,,,,,,N707,CTCTCTAC,S510,ATTAGACG,, -G8,,,,,,N710,CGAGGCTG,S510,ATTAGACG,, -G9,,,,,,N711,AAGAGGCA,S510,ATTAGACG,, -G10,,,,,,N712,GTAGAGGA,S510,ATTAGACG,, -G11,,,,,,N714,GCTCATGA,S510,ATTAGACG,, -G12,,,,,,N715,ATCTCAGG,S510,ATTAGACG,, -H1,,,,,,N701,TAAGGCGA,S511,CGGAGAGA,, -H2,,,,,,N702,CGTACTAG,S511,CGGAGAGA,, -H3,,,,,,N703,AGGCAGAA,S511,CGGAGAGA,, -H4,,,,,,N704,TCCTGAGC,S511,CGGAGAGA,, -H5,,,,,,N705,GGACTCCT,S511,CGGAGAGA,, -H6,,,,,,N706,TAGGCATG,S511,CGGAGAGA,, -H7,,,,,,N707,CTCTCTAC,S511,CGGAGAGA,, -H8,,,,,,N710,CGAGGCTG,S511,CGGAGAGA,, -H9,,,,,,N711,AAGAGGCA,S511,CGGAGAGA,, -H10,,,,,,N712,GTAGAGGA,S511,CGGAGAGA,, -H11,,,,,,N714,GCTCATGA,S511,CGGAGAGA,, -H12,,,,,,N715,ATCTCAGG,S511,CGGAGAGA,, -``` - -Simply run `bcl2fastq` like this: - -```console -bcl2fastq --no-lane-splitting \ - --ignore-missing-positions \ - --ignore-missing-controls \ - --ignore-missing-filter \ - --ignore-missing-bcls \ - -r 4 -w 4 -p 4 -``` - -After this, you will have `R1_001.fastq.gz` and `R2_001.fastq.gz` for each well: - -```bash -A1_S1_R1_001.fastq.gz # 34 bp: hairpin barcode + CAGAGC + UMI + RT barcode -A1_S1_R2_001.fastq.gz # 52 bp: cDNA -A2_S2_R1_001.fastq.gz # 34 bp: hairpin barcode + CAGAGC + UMI + RT barcode -A2_S2_R2_001.fastq.gz # 52 bp: cDNA -... -... -... -H11_S95_R1_001.fastq.gz # 34 bp: hairpin barcode + CAGAGC + UMI + RT barcode -H11_S95_R2_001.fastq.gz # 52 bp: cDNA -H12_S96_R1_001.fastq.gz # 34 bp: hairpin barcode + CAGAGC + UMI + RT barcode -H12_S96_R2_001.fastq.gz # 52 bp: cDNA -``` - -That's it. You are ready to go from here using `starsolo`. You can and should treat each well as separate experiments, and single cell can be identified by the combination of the __hairpin barcode__ and the __RT barcode__ in `R1`. Each well needs to be processed independently as if they are from different experiments. For example, the __hairpin + RT barcode__ `ACTTGATTGT + ACGTTCAACC` in the well __A1__ and the same barcode `ACTTGATTGT + ACGTTCAACC` in the well __A2__ represent different cells. Therefore, we need to generate count matrix for each well separately, and combine them in the downstream analysis. - -The advantage of doing this is that we actually divide each experiment into small chunks, and use the exact the same procedures for each chunk independently. In addition, the whitelist will simply be the combination of the 9 or 10 bp __hairpin barcode__ and the 10 bp __RT barcode__ for all the analysis. - -## Public Data - -For the purpose of demonstration, we will use the __sci-RNA-seq3__ data from the following paper: - -```{eval-rst} -.. note:: - - Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, Shendure J (2019) **The single-cell transcriptional landscape of mammalian organogenesis.** *Nature* 566:496–502. https://doi.org/10.1038/s41586-019-0969-x - -``` - -where the authors developed an improved version of sci-RNA-seq, which they called __sci-RNA-seq3__. They used the technology to generate a comprehensive single cell atlas during mouse organogenesis, with > 2 million cells covering E9.5 - E13.5. The data is in GEO under the accession code [GSE119945](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119945). You can get the `fastq` files directly from [__this ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA490754?show=reads). As you can see, there are a total of 760 accessions. Each accession represents the data from a well in the 3rd plate. This means the authors already demultiplexed the data based on `i7 + i5` index for us. We could just download each accession and process independently. Single cells can be identified by the combination of the 9 or 10 bp __hairpin barcode__ and the 10 bp __RT barcode__. - -I'm not going to do all 760 wells. Let's just use the data `SRR7827206` for the demonstration: - -```console -# get fastq files -mkdir -p sci-rna-seq3/data -wget -P sci-rna-seq3/data \ - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/006/SRR7827206/SRR7827206_1.fastq.gz \ - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/006/SRR7827206/SRR7827206_2.fastq.gz -``` - -## Prepare Whitelist - -The full oligo sequences can be found in the [Supplementary Table S11](https://teichlab.github.io/scg_lib_structs/data/41586_2019_969_MOESM3_ESM.xlsx) from the __sci-RNA-seq3__ paper. As you can see, there are a total of 384 different 10 bp __RT barcodes__, 384 different 9 or 10 bp __hairpin barcodes__, 96 different 10 bp __i7__ and 96 different 10 bp __i5__ barcodes. Theoretically, the full capacity of the combinatorial indices are __384 * 384 * 96 * 96 = 1,358,954,496__. Since the data are already demultiplexed by __i7 + i5__, we only need the hairpin barcode and RT barcode for the identification of single cells. I have collected the index table as follows, and the names of the oligos are directly taken from the paper to be consistent (showing only 5 records of each table): - -__RT Barcodes (10 bp)__ - -| Name | Sequence | Reverse complement | -|------------------|------------|:------------------:| -| sc_ligation_RT_1 | TCCTACCAGT | ACTGGTAGGA | -| sc_ligation_RT_2 | GCGTTGGAGC | GCTCCAACGC | -| sc_ligation_RT_3 | GATCTTACGC | GCGTAAGATC | -| sc_ligation_RT_4 | CTGATGGTCA | TGACCATCAG | -| sc_ligation_RT_5 | CCGAGAATCC | GGATTCTCGG | - -__Hairpin Barcodes (9 or 10 bp)__ - -| Name | Sequence | Reverse complement | -|---------------|------------|:------------------:| -| sc_ligation_1 | ACAATCAAGT | ACTTGATTGT | -| sc_ligation_2 | AAGCTGATTA | TAATCAGCTT | -| sc_ligation_3 | ACCATTCTTA | TAAGAATGGT | -| sc_ligation_4 | AATAGGTTGT | ACAACCTATT | -| sc_ligation_5 | ATCTAGGAAT | ATTCCTAGAT | - -I have put those two tables into `csv` files and you can download them to have a look: - -[sci-RNA-seq3_RT_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv) -[sci-RNA-seq3_hairpin_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_hairpin_bc.csv) - -Let's download them: - -```console -wget -P sci-rna-seq3/data \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_hairpin_bc.csv -``` - -Now we need to generate the whitelist of the __RT barcode__ and the __hairpin barcode__. Those barcodes are sequenced in __Read 1__ using the bottom strand as the template. They are in the same direction of the Illumina TruSeq Read 1 sequence. Therefore, we should take their sequences as they are. In addition, if you check the [__sci-RNA-seq3 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html), you will see that the __hairpin barcode__ is in front of the __RT barcode__ in the final library. Therefore, we should pass the whitelist to `starsolo` in that order. See the next section for more details. - -```bash -# hairpin barcode whitelist -tail -n +2 sci-rna-seq3/data/sci-RNA-seq3_hairpin_bc.csv | \ - cut -f 2 -d, > sci-rna-seq3/data/hairpin_whitelist.txt - -# RT barcode whitelist -tail -n +2 sci-rna-seq3/data/sci-RNA-seq3_RT_bc.csv | \ - cut -f 2 -d, > sci-rna-seq3/data/RT_whitelist.txt -``` - -## From FastQ To Count Matrix - -The variable length (9 or 10 bp) of __hairpin barcode__ makes the situation a bit more complicated. We need to run `starsolo` in the following way (see explanation later): - -```console -# map and generate the count matrix - -STAR --runThreadN 4 \ - --genomeDir mm10/star_index \ - --readFilesCommand zcat \ - --outFileNamePrefix sci-rna-seq3/star_outs/ \ - --readFilesIn sci-rna-seq3/data/SRR7827206_2.fastq.gz sci-rna-seq3/data/SRR7827206_1.fastq.gz \ - --soloType CB_UMI_Complex \ - --soloAdapterSequence CAGAGC \ - --soloCBposition 0_0_2_-1 3_9_3_18 \ - --soloUMIposition 3_1_3_8 \ - --soloCBwhitelist sci-rna-seq3/data/hairpin_whitelist.txt sci-rna-seq3/data/RT_whitelist.txt \ - --soloCBmatchWLtype 1MM \ - --soloCellFilter EmptyDrops_CR \ - --soloStrand Forward \ - --outSAMattributes CB UB \ - --outSAMtype BAM SortedByCoordinate -``` - -Once that is finished, you can do the exact the same thing with all the rest wells. In practice, you can do this via a loop or a pipeline. They can be run independently in parallel. - -## Explanation - -If you understand the __sci-RNA-seq3__ experimental procedures described in [this GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html), the command above should be straightforward to understand. - -`--runThreadN 4` - ->> Use 4 cores for the preprocessing. Change accordingly if using more or less cores. - -`--genomeDir mm10/star_index` - ->> Pointing to the directory of the star index. The public data from the above paper was produced using mouse embryos. - -`--readFilesCommand zcat` - ->> Since the `fastq` files are in `.gz` format, we need the `zcat` command to extract them on the fly. - -`--outFileNamePrefix sci-rna-seq3/star_outs/` - ->> We want to keep everything organised. This parameter directs all output files into the `sci-rna-seq3/star_outs/` directory. - -`--readFilesIn` - ->> If you check the manual, we should put two files here. The first file is the reads that come from cDNA, and the second file should contain cell barcode and UMI. In __sci-RNA-seq3__, cDNA reads come from Read 2, and the cell barcode and UMI come from Read 1. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) if you are not sure. - -`--soloType CB_UMI_Complex` - ->> Since Read 1 not only has cell barcodes and UMI, the common linker sequences are also there. The cell barcodes are non-consecutive, separated by the linker sequences. In this case, we have to use the `CB_UMI_Complex` option. Of course, we could also use `UMI-tools` to extract the cell barcode and UMI, but that's slow. It is better to use this option. - -`--soloAdapterSequence CAGAGC` - ->> The variable length (9 or 10 bp) of the __hairpin barcode__ at the beginning of __Read 1__ makes the situation complicated, because the absolute positions of the __RT barcode__ and __UMI__ in each read will vary. However, by specifying an adapter sequence, we could use this sequence as an anchor, and tell the program where cell barcodes and UMI are located relatively to the anchor. `CAGAGC` is the constant linker sequence in the middle, separating the __hairpin barcode__ and __UMI__. - -`--soloCBposition` and `--soloUMIposition` - ->> These options specify the locations of cell barcode and UMI in the 2nd fastq files we passed to `--readFilesIn`. In this case, it is __Read 1__. Read the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) for more details. I have drawn a picture to help myself decide the exact parameters. There are some freedom here depending on what you are using as anchors. Due to the 9 or 10 bp __hairpin barcode__, the absolute positions of __RT barcodes__ and __UMI__ in the middle are variable. Therefore, using Read start as anchor will not work for them. We need to use the adaptor as the anchor, and specify the positions relative to the anchor. See the image: - -![](https://teichlab.github.io/scg_lib_structs/data/Star_CB_UMI_Complex_sci-RNA-seq3.jpg) - -```{eval-rst} -.. important:: - - This option seems to work for me. Normally, we would choose an adapter sequence with decent length. In this case, we only have a short 6-bp constant linker as the adapter: ``CAGAGC``. If you look at the sequence in the **hairpin barcode** and the **RT barcode**, ``CAGAGC`` does not exist there. In the random 8-bp UMI, it might appear. When this happens, ``starsolo`` will only use the first appearance as the anchor, which is good here. -``` - -`--soloCBwhitelist` - ->> Since the real cell barcodes consists of two non-consecutive parts: the __hairpin barcode__ and the __RT barcode__, the whitelist here is the combination of the two sub-lists. We should provide them separately and `star` will take care of the combinations. - -`--soloCBmatchWLtype 1MM` - ->> How stringent we want the cell barcode reads to match the whitelist. The default option (`1MM_Multi`) does not work here. We choose this one here for simplicity, but you might want to experimenting different parameters to see what the difference is. - -`--soloCellFilter EmptyDrops_CR` - ->> Experiments are never perfect. Even for barcodes that do not capture the molecules inside the cells, you may still get some reads due to various reasons, such as ambient RNA or DNA and leakage. In general, the number of reads from those cell barcodes should be much smaller, often orders of magnitude smaller, than those barcodes that come from real cells. In order to identify true cells from the background, you can apply different algorithms. Check the `star` manual for more information. We use `EmptyDrops_CR` which is the most frequently used parameter. - -`--soloStrand Forward` - ->> The choice of this parameter depends on where the cDNA reads come from, i.e. the reads from the first file passed to `--readFilesIn`. You need to check the experimental protocol. If the cDNA reads are from the same strand as the mRNA (the coding strand), this parameter will be `Forward` (this is the default). If they are from the opposite strand as the mRNA, which is often called the first strand, this parameter will be `Reverse`. In the case of __sci-RNA-seq3__, the cDNA reads are from the Read 2 file. During the experiment, the mRNA molecules are captured by barcoded oligo-dT primer containing UMI and the Illumina Read 1 sequence. Therefore, Read 1 consists of RT barcodes and UMI. They come from the first strand, complementary to the coding strand. Read 2 comes from the coding strand. Therefore, use `Forward` for __sci-RNA-seq3__ data. This `Forward` parameter is the default, because many protocols generate data like this, but I still specified it here to make it clear. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) if you are not sure. - -`--outSAMattributes CB UB` - ->> We want the cell barcode and UMI sequences in the `CB` and `UB` attributes of the output, respectively. The information will be very helpful for downstream analysis. - -`--outSAMtype BAM SortedByCoordinate` - ->> We want sorted `BAM` for easy handling by other programs. - -If everything goes well, your directory should look the same as the following: - -```console -scg_prep_test/sci-rna-seq3/ -├── data -│   ├── hairpin_whitelist.txt -│   ├── RT_whitelist.txt -│   ├── sci-RNA-seq3_hairpin_bc.csv -│   ├── sci-RNA-seq3_RT_bc.csv -│   ├── SRR7827206_1.fastq.gz -│   └── SRR7827206_2.fastq.gz -└── star_outs - ├── Aligned.sortedByCoord.out.bam - ├── Log.final.out - ├── Log.out - ├── Log.progress.out - ├── SJ.out.tab - └── Solo.out - ├── Barcodes.stats - └── Gene - ├── Features.stats - ├── filtered - │   ├── barcodes.tsv - │   ├── features.tsv - │   └── matrix.mtx - ├── raw - │   ├── barcodes.tsv - │   ├── features.tsv - │   └── matrix.mtx - ├── Summary.csv - └── UMIperCellSorted.txt - -6 directories, 21 files -``` \ No newline at end of file diff --git a/docs/source/ge/sci-RNA-seq.md b/docs/source/ge/sci-RNA-seq.md index f5f2416..bb089b7 100644 --- a/docs/source/ge/sci-RNA-seq.md +++ b/docs/source/ge/sci-RNA-seq.md @@ -1,6 +1,6 @@ # sci-RNA-seq -Check [this GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html) to see how __sci-RNA-seq__ libraries are generated experimentally. This is a split-pool based combinatorial indexing strategy, where fixed cells are used as the reaction chamber. mRNA molecules are marked by oligo-dT primer with distinct barcodes in 96 or 384 minibulk reactions in the plate format (the first plate). Then all cells are pooled and randomly distributed into a new 96- or 384-well plate (the second plate). Library preparation is performed using the Tn5-based Illumina Nextera strategy to add __i5__ and __i7__ indices. Single cells can be identified by the combination of the RT barcode and __i5 + i7__. In addition, another level of barcode can be added during the tagmentation by barcoded Tn5, but this documentation will just focus on two-level barcodes, without the Tn5 index. +Check [this GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) to see how __sci-RNA-seq__ libraries are generated experimentally. This is a split-pool based combinatorial indexing strategy, where fixed cells are used as the reaction chamber. mRNA molecules are marked by oligo-dT primer with distinct barcodes in 96 or 384 minibulk reactions in the plate format (the first plate). Then all cells are pooled and randomly distributed into a new 96- or 384-well plate (the second plate). Library preparation is performed using the Tn5-based Illumina Nextera strategy to add __i5__ and __i7__ indices. Single cells can be identified by the combination of the RT barcode and __i5 + i7__. In addition, another level of barcode can be added during the tagmentation by barcoded Tn5, but this documentation will just focus on two-level barcodes, without the Tn5 index. ## For Your Own Experiments @@ -283,7 +283,7 @@ As you can see, those reads are 18 bp in length. The first 8 bp are UMI and the To generate the whitelist, you need the 10-bp RT barcodes, the __i7__ and __i5__ indices. Generate a combination of them as the pool of all possible cell barcodes. -Unfortunately, in the [__sci-RNA-seq paper__](http://science.sciencemag.org/content/357/6352/661), I cannot seem to find the information of those oligos. However, in the [__sci-RNA-seq3 paper__](https://www.nature.com/articles/s41586-019-0969-x) which is an updated version of the original one, I can find 384 different 10-bp RT barcodes, 96 different 10-bp `i5` index and 96 different 10-bp `i7` index from the [Supplementary Table S11](https://teichlab.github.io/scg_lib_structs/data/41586_2019_969_MOESM3_ESM.xlsx) of the paper. The __sci-RNA-seq__ seem to use the same barcodes. We could collect the index sequences as tables as follows, and the names of the oligos are directly taken from the paper to be consistent (showing only 5 of the table to save space): +Unfortunately, in the [__sci-RNA-seq paper__](http://science.sciencemag.org/content/357/6352/661), I cannot seem to find the information of those oligos. However, in the [__sci-RNA-seq3 paper__](https://www.nature.com/articles/s41586-019-0969-x) which is an updated version of the original one, I can find 384 different 10-bp RT barcodes, 96 different 10-bp `i5` index and 96 different 10-bp `i7` index from the [Supplementary Table S11](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/41586_2019_969_MOESM3_ESM.xlsx) of the paper. The __sci-RNA-seq__ seem to use the same barcodes. We could collect the index sequences as tables as follows, and the names of the oligos are directly taken from the paper to be consistent (showing only 5 of the table to save space): __RT Barcodes (10 bp)__ @@ -317,17 +317,17 @@ __i5 Barcodes (10 bp)__ I have put those three tables into `csv` files and you can download them to have a look: -[sci-RNA-seq3_RT_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv) -[sci-RNA-seq3_p7.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_p7.csv) -[sci-RNA-seq3_p5.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_p5.csv) +[sci-RNA-seq3_RT_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_RT_bc.csv) +[sci-RNA-seq3_p7.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_p7.csv) +[sci-RNA-seq3_p5.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_p5.csv) Let's download them: ```console wget -P sci-rna-seq/data \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_p7.csv \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_p5.csv + https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_RT_bc.csv \ + https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_p7.csv \ + https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_p5.csv ``` If you use the full capacity of those oligos, you could have a capacity of __384 * 96 * 96 = 3,538,944__ barcodes. @@ -343,7 +343,7 @@ tail -n +2 sci-rna-seq/data/sci-RNA-seq3_RT_bc.csv | \ ### Whitelist For Strategy 2 -In this strategy, you are going to process the data for all wells in an experiment or multiple experiments. The cells will be identified by the combination of __RT barcode + i7 + i5__. The sequence of `i7` and `i5` depends on the primers you used. In this case for the public data, we only need the `i7`, because that is the index used to index each well. Therefore, we need to generate all combinations of __RT barcode + i7__ for this specific data set. Again, the RT barcode is in the same direction of the Illumina TruSeq Read 1 sequence, so we should take the sequences as they are. However, the `i7` index is always sequenced using the bottom strand as the template, so we need to take the reverse complement of the sequence. Check the [__sci-RNA-seq GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html) if you are still confused: +In this strategy, you are going to process the data for all wells in an experiment or multiple experiments. The cells will be identified by the combination of __RT barcode + i7 + i5__. The sequence of `i7` and `i5` depends on the primers you used. In this case for the public data, we only need the `i7`, because that is the index used to index each well. Therefore, we need to generate all combinations of __RT barcode + i7__ for this specific data set. Again, the RT barcode is in the same direction of the Illumina TruSeq Read 1 sequence, so we should take the sequences as they are. However, the `i7` index is always sequenced using the bottom strand as the template, so we need to take the reverse complement of the sequence. Check the [__sci-RNA-seq GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) if you are still confused: ```bash for x in $(tail -n +2 sci-rna-seq/data/sci-RNA-seq3_RT_bc.csv | cut -f 2 -d,); do @@ -425,7 +425,7 @@ STAR --runThreadN 4 \ ## Explanation -If you understand the __sci-RNA-seq__ experimental procedures described in [this GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html), the command above should be straightforward to understand. +If you understand the __sci-RNA-seq__ experimental procedures described in [this GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html), the command above should be straightforward to understand. `--runThreadN 4` @@ -445,7 +445,7 @@ If you understand the __sci-RNA-seq__ experimental procedures described in [this `--readFilesIn` ->> If you check the manual, we should put two files here. The first file is the reads that come from cDNA, and the second file should contain cell barcode and UMI. In __sci-RNA-seq__, cDNA reads come from Read 2, and the cell barcode and UMI come from Read 1 or the `CB_UMI` file you just prepared. Check [the sci-RNA-seq GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html) if you are not sure. +>> If you check the manual, we should put two files here. The first file is the reads that come from cDNA, and the second file should contain cell barcode and UMI. In __sci-RNA-seq__, cDNA reads come from Read 2, and the cell barcode and UMI come from Read 1 or the `CB_UMI` file you just prepared. Check [the sci-RNA-seq GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) if you are not sure. `--soloType CB_UMI_Simple` @@ -471,7 +471,7 @@ If you understand the __sci-RNA-seq__ experimental procedures described in [this `--soloStrand Forward` ->> The choice of this parameter depends on where the cDNA reads come from, i.e. the reads from the first file passed to `--readFilesIn`. You need to check the experimental protocol. If the cDNA reads are from the same strand as the mRNA (the coding strand), this parameter will be `Forward` (this is the default). If they are from the opposite strand as the mRNA, which is often called the first strand, this parameter will be `Reverse`. In the case of __sci-RNA-seq__, the cDNA reads are from the Read 2 file. During the experiment, the mRNA molecules are captured by barcoded oligo-dT primer containing UMI and the Illumina Read 1 sequence. Therefore, Read 1 consists of RT barcodes and UMI. They come from the first strand, complementary to the coding strand. Read 2 comes from the coding strand. Therefore, use `Forward` for __sci-RNA-seq__ data. This `Forward` parameter is the default, because many protocols generate data like this, but I still specified it here to make it clear. Check [the sci-RNA-seq GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq.html) if you are not sure. +>> The choice of this parameter depends on where the cDNA reads come from, i.e. the reads from the first file passed to `--readFilesIn`. You need to check the experimental protocol. If the cDNA reads are from the same strand as the mRNA (the coding strand), this parameter will be `Forward` (this is the default). If they are from the opposite strand as the mRNA, which is often called the first strand, this parameter will be `Reverse`. In the case of __sci-RNA-seq__, the cDNA reads are from the Read 2 file. During the experiment, the mRNA molecules are captured by barcoded oligo-dT primer containing UMI and the Illumina Read 1 sequence. Therefore, Read 1 consists of RT barcodes and UMI. They come from the first strand, complementary to the coding strand. Read 2 comes from the coding strand. Therefore, use `Forward` for __sci-RNA-seq__ data. This `Forward` parameter is the default, because many protocols generate data like this, but I still specified it here to make it clear. Check [the sci-RNA-seq GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) if you are not sure. `--outSAMattributes CB UB` diff --git a/docs/source/ge/sci-RNA-seq3.md b/docs/source/ge/sci-RNA-seq3.md index d9f01c4..1fc2326 100644 --- a/docs/source/ge/sci-RNA-seq3.md +++ b/docs/source/ge/sci-RNA-seq3.md @@ -1,6 +1,6 @@ # sci-RNA-seq3 -Check [this GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) to see how __sci-RNA-seq3__ libraries are generated experimentally. This is a split-pool based combinatorial indexing strategy, where fixed cells are used as the reaction chamber. mRNA molecules are marked by oligo-dT primer with distinct barcodes in minibulk reactions in the plate format (the first plate). Then all cells are pooled and randomly distributed into a new plate (the second plate), where barcoded hairpin adaptor is ligated to add a second level barcode. After that, all cells are pooled again and 2000 - 4000 cells are randomly distributed into the well of a new plate (the third plate). Library preparation is performed in the third plate to add __i5__ and __i7__ indices. Single cells can be identified by the combination of the __RT barcode__, the __hairpin barcode__ and __i5 + i7__. It is an updated and improved version of the [__sci-RNA-seq__](./sci-RNA-seq.md) method. +Check [this GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) to see how __sci-RNA-seq3__ libraries are generated experimentally. This is a split-pool based combinatorial indexing strategy, where fixed cells are used as the reaction chamber. mRNA molecules are marked by oligo-dT primer with distinct barcodes in minibulk reactions in the plate format (the first plate). Then all cells are pooled and randomly distributed into a new plate (the second plate), where barcoded hairpin adaptor is ligated to add a second level barcode. After that, all cells are pooled again and 2000 - 4000 cells are randomly distributed into the well of a new plate (the third plate). Library preparation is performed in the third plate to add __i5__ and __i7__ indices. Single cells can be identified by the combination of the __RT barcode__, the __hairpin barcode__ and __i5 + i7__. It is an updated and improved version of the [__sci-RNA-seq__](./sci-RNA-seq.md) method. ## For Your Own Experiments @@ -198,7 +198,7 @@ wget -P sci-rna-seq3/data \ ## Prepare Whitelist -The full oligo sequences can be found in the [Supplementary Table S11](https://teichlab.github.io/scg_lib_structs/data/41586_2019_969_MOESM3_ESM.xlsx) from the __sci-RNA-seq3__ paper. As you can see, there are a total of 384 different 10 bp __RT barcodes__, 384 different 9 or 10 bp __hairpin barcodes__, 96 different 10 bp __i7__ and 96 different 10 bp __i5__ barcodes. Theoretically, the full capacity of the combinatorial indices are __384 * 384 * 96 * 96 = 1,358,954,496__. Since the data are already demultiplexed by __i7 + i5__, we only need the hairpin barcode and RT barcode for the identification of single cells. I have collected the index table as follows, and the names of the oligos are directly taken from the paper to be consistent (showing only 5 records of each table): +The full oligo sequences can be found in the [Supplementary Table S11](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/41586_2019_969_MOESM3_ESM.xlsx) from the __sci-RNA-seq3__ paper. As you can see, there are a total of 384 different 10 bp __RT barcodes__, 384 different 9 or 10 bp __hairpin barcodes__, 96 different 10 bp __i7__ and 96 different 10 bp __i5__ barcodes. Theoretically, the full capacity of the combinatorial indices are __384 * 384 * 96 * 96 = 1,358,954,496__. Since the data are already demultiplexed by __i7 + i5__, we only need the hairpin barcode and RT barcode for the identification of single cells. I have collected the index table as follows, and the names of the oligos are directly taken from the paper to be consistent (showing only 5 records of each table): __RT Barcodes (10 bp)__ @@ -222,18 +222,18 @@ __Hairpin Barcodes (9 or 10 bp)__ I have put those two tables into `csv` files and you can download them to have a look: -[sci-RNA-seq3_RT_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv) -[sci-RNA-seq3_hairpin_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_hairpin_bc.csv) +[sci-RNA-seq3_RT_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_RT_bc.csv) +[sci-RNA-seq3_hairpin_bc.csv](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_hairpin_bc.csv) Let's download them: ```console wget -P sci-rna-seq3/data \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_RT_bc.csv \ - https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq3_hairpin_bc.csv + https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_RT_bc.csv \ + https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/sci-RNA-seq3_hairpin_bc.csv ``` -Now we need to generate the whitelist of the __RT barcode__ and the __hairpin barcode__. Those barcodes are sequenced in __Read 1__ using the bottom strand as the template. They are in the same direction of the Illumina TruSeq Read 1 sequence. Therefore, we should take their sequences as they are. In addition, if you check the [__sci-RNA-seq3 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html), you will see that the __hairpin barcode__ is in front of the __RT barcode__ in the final library. Therefore, we should pass the whitelist to `starsolo` in that order. See the next section for more details. +Now we need to generate the whitelist of the __RT barcode__ and the __hairpin barcode__. Those barcodes are sequenced in __Read 1__ using the bottom strand as the template. They are in the same direction of the Illumina TruSeq Read 1 sequence. Therefore, we should take their sequences as they are. In addition, if you check the [__sci-RNA-seq3 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html), you will see that the __hairpin barcode__ is in front of the __RT barcode__ in the final library. Therefore, we should pass the whitelist to `starsolo` in that order. See the next section for more details. ```bash # hairpin barcode whitelist @@ -273,7 +273,7 @@ Once that is finished, you can do the exact the same thing with all the rest wel ## Explanation -If you understand the __sci-RNA-seq3__ experimental procedures described in [this GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html), the command above should be straightforward to understand. +If you understand the __sci-RNA-seq3__ experimental procedures described in [this GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html), the command above should be straightforward to understand. `--runThreadN 4` @@ -293,7 +293,7 @@ If you understand the __sci-RNA-seq3__ experimental procedures described in [thi `--readFilesIn` ->> If you check the manual, we should put two files here. The first file is the reads that come from cDNA, and the second file should contain cell barcode and UMI. In __sci-RNA-seq3__, cDNA reads come from Read 2, and the cell barcode and UMI come from Read 1. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) if you are not sure. +>> If you check the manual, we should put two files here. The first file is the reads that come from cDNA, and the second file should contain cell barcode and UMI. In __sci-RNA-seq3__, cDNA reads come from Read 2, and the cell barcode and UMI come from Read 1. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) if you are not sure. `--soloType CB_UMI_Complex` @@ -307,7 +307,7 @@ If you understand the __sci-RNA-seq3__ experimental procedures described in [thi >> These options specify the locations of cell barcode and UMI in the 2nd fastq files we passed to `--readFilesIn`. In this case, it is __Read 1__. Read the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) for more details. I have drawn a picture to help myself decide the exact parameters. There are some freedom here depending on what you are using as anchors. Due to the 9 or 10 bp __hairpin barcode__, the absolute positions of __RT barcodes__ and __UMI__ in the middle are variable. Therefore, using Read start as anchor will not work for them. We need to use the adaptor as the anchor, and specify the positions relative to the anchor. See the image: -![](https://teichlab.github.io/scg_lib_structs/data/Star_CB_UMI_Complex_sci-RNA-seq3.jpg) +![](https://teichlab.github.io/scg_lib_structs/data/sci-RNA-seq_family/Star_CB_UMI_Complex_sci-RNA-seq3.jpg) ```{eval-rst} .. important:: @@ -329,7 +329,7 @@ If you understand the __sci-RNA-seq3__ experimental procedures described in [thi `--soloStrand Forward` ->> The choice of this parameter depends on where the cDNA reads come from, i.e. the reads from the first file passed to `--readFilesIn`. You need to check the experimental protocol. If the cDNA reads are from the same strand as the mRNA (the coding strand), this parameter will be `Forward` (this is the default). If they are from the opposite strand as the mRNA, which is often called the first strand, this parameter will be `Reverse`. In the case of __sci-RNA-seq3__, the cDNA reads are from the Read 2 file. During the experiment, the mRNA molecules are captured by barcoded oligo-dT primer containing UMI and the Illumina Read 1 sequence. Therefore, Read 1 consists of RT barcodes and UMI. They come from the first strand, complementary to the coding strand. Read 2 comes from the coding strand. Therefore, use `Forward` for __sci-RNA-seq3__ data. This `Forward` parameter is the default, because many protocols generate data like this, but I still specified it here to make it clear. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq3.html) if you are not sure. +>> The choice of this parameter depends on where the cDNA reads come from, i.e. the reads from the first file passed to `--readFilesIn`. You need to check the experimental protocol. If the cDNA reads are from the same strand as the mRNA (the coding strand), this parameter will be `Forward` (this is the default). If they are from the opposite strand as the mRNA, which is often called the first strand, this parameter will be `Reverse`. In the case of __sci-RNA-seq3__, the cDNA reads are from the Read 2 file. During the experiment, the mRNA molecules are captured by barcoded oligo-dT primer containing UMI and the Illumina Read 1 sequence. Therefore, Read 1 consists of RT barcodes and UMI. They come from the first strand, complementary to the coding strand. Read 2 comes from the coding strand. Therefore, use `Forward` for __sci-RNA-seq3__ data. This `Forward` parameter is the default, because many protocols generate data like this, but I still specified it here to make it clear. Check [the sci-RNA-seq3 GitHub Page](https://teichlab.github.io/scg_lib_structs/methods_html/sci-RNA-seq_family.html) if you are not sure. `--outSAMattributes CB UB` diff --git a/methods_html/Microwell-seq.html b/methods_html/Microwell-seq.html index 6ce8ca6..5f91f96 100644 --- a/methods_html/Microwell-seq.html +++ b/methods_html/Microwell-seq.html @@ -132,7 +132,7 @@

(5) Purfify amplified full-length double stranded cDNAs and they look like t

(6) Tagmentation using a transposase Tn5 with two identical insertion sequences (it was not explicitly mentioned what sequences were used in the original paper, but based on the PCR oligo sequences, it is highly likely a Tn5 homodimer with s7-ME oligo):

-Tn5 dimer +Tn5 dimer
 Product 1: right hand side of above cDNA (5'-end transcript), not amplifiable due to primer used (see next step):
 
diff --git a/methods_html/Paired-seq.html b/methods_html/Paired-seq.html
index 6cc55df..937f41f 100644
--- a/methods_html/Paired-seq.html
+++ b/methods_html/Paired-seq.html
@@ -9,7 +9,7 @@
 
 

Paired-seq

-

The Paired-seq method is developed based on the idea of combinatorial indexing stratgy that is used in sci-RNA-seq and SPLiT-seq to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription.

+

The Paired-seq method is developed based on the idea of combinatorial indexing stratgy that is used in sci-RNA-seq and SPLiT-seq to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription.


diff --git a/methods_html/SHARE-seq.html b/methods_html/SHARE-seq.html index 74fc28a..9454855 100644 --- a/methods_html/SHARE-seq.html +++ b/methods_html/SHARE-seq.html @@ -9,7 +9,7 @@

SHARE-seq

-

The SHARE-seq method is developed based on the idea of combinatorial indexing stratgy that is used in sci-RNA-seq and SPLiT-seq. The method introduced three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multiomic profiling from the same single cells.

+

The SHARE-seq method is developed based on the idea of combinatorial indexing stratgy that is used in sci-RNA-seq and SPLiT-seq. The method introduced three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multiomic profiling from the same single cells.


diff --git a/methods_html/SureCell.html b/methods_html/SureCell.html index 1d0c9bf..d7c4648 100644 --- a/methods_html/SureCell.html +++ b/methods_html/SureCell.html @@ -48,7 +48,7 @@

(2) Break emulsion, clean up and RNaseH and DNA Pol I based second strand sy

(3) Clean double stranded cDNA, and tagmentation using Nextera SureCell transposome (highly likely a Tn5 homodimer with s7-ME oligo):

-Tn5 dimer +Tn5 dimer
 
 5'- AAGCAGTGGTATCAACGCAGAGTAC[6-bp barcode1]TAGCCATCGCATTGC[6-bp barcode2]TACCTCTGAGCTGAA[6-bp barcode3]ACG[8-bp UMI]GAC(dT)VXXX...XXX         CTGTCTCTTATACACATCT
diff --git a/methods_html/itChIP-seq.html b/methods_html/itChIP-seq.html
index 085a843..fbd9feb 100644
--- a/methods_html/itChIP-seq.html
+++ b/methods_html/itChIP-seq.html
@@ -7,7 +7,7 @@
 
 
 
-

itChIP-seq

+

itChIP-seq

The indexing and tagmentation-based ChIP-seq (itChIP-seq) method uses similar strategy as Drop-ChIP. Instead of using MNase and barcoding in droplets, itChIP-seq uses barcoded Tn5 to tag sorted single nuclei. After pooling all indexed nuclei, an immunoprecipitation is performed using an antibody against a protein of interest.

diff --git a/methods_html/sci-RNA-seq.html b/methods_html/sci-RNA-seq.html deleted file mode 100644 index fbe064b..0000000 --- a/methods_html/sci-RNA-seq.html +++ /dev/null @@ -1,156 +0,0 @@ - - - - - -sci-RNA-seq - - - -

sci-RNA-seq

- -

The sci-RNA-seq uses the combinatorial indexing to identify single cells without single cell isolation. Two-level indexing (RT barcode + PCR barcodes (i5 + i7)) or three-level indexing (RT barcode + PCR barcodes (i5 + i7) + Tn5 barcodes) can be used. Three-level indexing is a bit more difficult since you need to assemble many indexed Tn5 transposomes. Here, two-level indexing strategy is demonstrated.

- -
- -

Adapter and primer sequences:

- -

Barcoded RT primer: 5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode]TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN -3'

-

Nextera Tn5 binding site (19-bp Mosaic End (ME)): 5'- AGATGTGTATAAGAGACAG -3'

-

Nextera N/S5xx primer entry point (s5): 5'- TCGTCGGCAGCGTC -3'

-

Nextera N7xx primer entry point (s7): 5'- GTCTCGTGGGCTCGG -3'

-

Illumina P5 Primer: 5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

-

Illumina P7 Primer: 5'- CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG -3'

-

Read 1 sequencing primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

-

Index 1 sequencing primer (i7): 5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'

-

Index 2 sequencing primer (i5): 5'- AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -3'

-

Read 2 seuquencing primer: 5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG -3'

-
- - -
- -

Step-by-step library generation

-

(1) Anneal Barcoded RT primer to mRNA in fixed cells and reverse transcription using MMLV in situ:

-
-
-5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](T)30VN---------->
-                                                  (A)n BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -5'
-
-
- -

(2) Pool all wells, and re-distribute into wells in a new plate, and perform RNaseH and DNA Pol I based second strand synthesis:

-
-
-5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
-3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
-
-
- -

(3) Add 5ng genomic DNA as carrier, and use Illumina standard Nextera tagmentation on double stranded cDNA plus genomic DNA (will create 9-bp gap):

-Tn5 dimer -
-
-Product 1 (s5 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
-
-5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
-                  TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGACTGCGACGGCTGCT -5'
-
-
-
-Product 2 (s7 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
-
-5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
-                   TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
-
-Product 3 (different s5 and s7 at both ends, not amplifiable, due to the use of Illumina P5/P7 Primer, see the next step):
-
-5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
-                  TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
-
-Product 4 (s5 at one end, 3' of cDNA at the other end, not amplifiable, due to the use of Illumina P5/P7 Primer, see the next step):
-
-5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT -3'
-3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGACTGCGACGGCTGCT -5'
-
-
-
-Product 5 (s7 at one end, 3' of cDNA at the other end, the only amplifiable product, see the next step):
-
-5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
-3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
-
- -

(4) 72 degree gap fill-in (the first cycle in Nextera PCR):

-
-
-5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXXXXXXXXXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'
-3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
- -

(5) Adding Illumina P5/P7 Primers for library amplification:

-
-
-5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT------>
-                                                5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'
-                                                3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-                                                                                                                          <---------GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(6) Final library structure:

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-             Illumina P5              i5     This bit is Truseq adapter     8bp UMI   10bp RT        cDNA             ME              s7           i7        Illumina P7
-                                                                                      barcode
-
-
- - -

Library sequencing:

- -

(1) Add read 1 sequencing primer to sequence the first read (bottom strand as template, these are the UMI and RT barcodes, 18 cycles):

-
-
-                                       5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT----------------->
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(2) Add Index 1 sequencing primer to sequence i7 index (bottom strand as template, 10 cycles):

-
-
-                                                                                                         5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC--------->
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(3) Cluster regeneration, add Index 2 sequencing primer to sequence the second index (i5 index) (top strand as template, 10 cycles):

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-                                  <--------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'
-
-
- -

(4) Add Read 2 sequencing primer to sequence the second read (top strand as template, this is the cDNA read, 52 cycles):

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-                                                                                                       <-----GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
- -
- - - diff --git a/methods_html/sci-RNA-seq3.html b/methods_html/sci-RNA-seq3.html deleted file mode 100644 index cc71d62..0000000 --- a/methods_html/sci-RNA-seq3.html +++ /dev/null @@ -1,162 +0,0 @@ - - - - - -sci-RNA-seq3 - - - -

sci-RNA-seq3

- -

The sci-RNA-seq3 is an updated version of sci-RNA-seq. The major improvements are:

-

(1) nuclei are extracted directly from fresh tissues without enzymatic treatment;

-

(2) hairpin ligation for the third level indexing (barcoded Tn5 tagmentation was used in the previous version);

-

(3) individually optimised enzymatic reactions;

-

(4) FACS was replaced by dilution, and sonication and filtration steps were added to minimize aggregation.

- -
- -

Adapter and primer sequences:

- -

Barcoded RT primer: 5'- /Phos/CAGAGC[8-bp UMI][10-bp RT barcode]TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN -3'

-

* There are 384 barcoded RT primers, click here to see the full sequence.

-

Barcoded hairpin adapters: 5'- GCTCTG[reverse complement of barcode A]/ddU/ACGACGCTCTTCCGATCT[9-bp or 10-bp barcode A] -3'

-

* There are 384 barcoded hairpin adapters, click here to see the full sequence. The structure of these adapters is like this:

-
-
-         CTTCCGATCT
-        /          NNNNNNNNNN -3'
-        |          NNNNNNNNNNGTCTCG -5'
-        TCGCAGCAddU
-
-
-

Nextera Tn5 binding site (19-bp Mosaic End (ME)): 5'- AGATGTGTATAAGAGACAG -3'

-

Nextera N7xx primer entry point (s7): 5'- GTCTCGTGGGCTCGG -3'

-

PCR P5 Primer: 5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

-

* There are 96 barcoded P5 primers, click here to see the full sequence.

-

PCR P7 Primer: 5'- CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG -3'

-

* There are 96 barcoded P7 primers, click here to see the full sequence.

-

Read 1 sequencing primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

-

Index 1 sequencing primer (i7): 5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'

-

Index 2 sequencing primer (i5): 5'- AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -3'

-

Read 2 seuquencing primer: 5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG -3'

-
- - -
- -

Step-by-step library generation

-

(1) Anneal Barcoded RT primer to mRNA in fixed cells and reverse transcription using MMLV in situ:

-
-
-5'- CAGAGC[8-bp UMI][10-bp RT barcode](T)30VN---------->
-                                      (A)n BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -5'
-
-
- -

(2) Pool all wells, and re-distribute into wells in a new plate, and ligate barcoded hairpin adapters:

-
-
- CTTCCGATCT
-/          NNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
-|          NNNNNNNNNNGTCTCG -5'                        (pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
-TCGCAGCAddU
-
-
- -

(3) Pool all wells again, and re-distribute into wells in a new plate, and perform RNaseH and DNA Pol I based second strand synthesis. DNA Pol I has strand displacement activity, so the hairpin structure is destroyed during the sencond strand synthesis:

-
-
-5'- GCTCTGNNNNNNNNNN/ddU/ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
-3'- CGAGACNNNNNNNNNN  A  TGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
-
-
- -

(4) Perform tagmentation using a Tn5 homodimer with s7-ME oligo (will create 9-bp gap):

-Tn5 dimer -
-
-Product 1 (s7 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
-
-5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
-                   TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
-
-Product 2 (s7 at one end, 3' of cDNA at the other end, the only amplifiable product, see the next step):
-
-5'- GCTCTGNNNNNNNNNN/ddU/ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
-3'- CGAGACNNNNNNNNNN  A  TGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
-
- -

(5) NEB USER Enzyme treatment to destroy the uracil base (ddU):

-
-
-5'- GCTCTGNNNNNNNNNN ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
-3'- CGAGACNNNNNNNNNNATGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
- -

(6) Adding PCR P5/P7 Primers for library amplification:

-
-
-5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTAC
-                                                    ACGACGCTCTTCCGATCT------>
-                               5'- GCTCTGNNNNNNNNNN ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
-                               3'- CGAGACNNNNNNNNNNATGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-                                                                                                                                                  <---------GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(7) Final library structure:

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-             Illumina P5              i5     This bit is Truseq adapter     9bp or 10bp     8bp UMI   10bp RT        cDNA             ME              s7           i7        Illumina P7
-                                                                          hairpin barcode             barcode
-
-
- - -

Library sequencing:

- -

(1) Add read 1 sequencing primer to sequence the first read (bottom strand as template, these are the hairpin barcode + GTCTCG + UMI + RT barcodes, 34 cycles):

-
-
-                                       5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT--------------------------------->
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(2) Add Index 1 sequencing primer to sequence i7 index (bottom strand as template, 10 cycles):

-
-
-                                                                                                                         5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC--------->
-3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
-
-
- -

(3) Cluster regeneration, add Index 2 sequencing primer to sequence the second index (i5 index) (top strand as template, 10 cycles):

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-                                 <---------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'
-
-
- -

(4) Add Read 2 sequencing primer to sequence the second read (top strand as template, this is the cDNA read, 52 cycles):

-
-
-5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
-                                                                                                                       <-----GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
-
-
- -
- - - diff --git a/methods_html/sci-RNA-seq_family.html b/methods_html/sci-RNA-seq_family.html new file mode 100644 index 0000000..0f1c25a --- /dev/null +++ b/methods_html/sci-RNA-seq_family.html @@ -0,0 +1,311 @@ + + + + + +sci-RNA-seq/sci-RNA-seq3 + + + + +

sci-RNA-seq + / sci-RNA-seq3

+ +
+ +

sci-RNA-seq

+ +

The sci-RNA-seq uses the combinatorial indexing to identify single cells without single cell isolation. Two-level indexing (RT barcode + PCR barcodes (i5 + i7)) or three-level indexing (RT barcode + PCR barcodes (i5 + i7) + Tn5 barcodes) can be used. Three-level indexing is a bit more difficult since you need to assemble many indexed Tn5 transposomes. Here, two-level indexing strategy is demonstrated.

+ +
+ +

Adapter and primer sequences:

+ +

Barcoded RT primer: 5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode]TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN -3'

+

Nextera Tn5 binding site (19-bp Mosaic End (ME)): 5'- AGATGTGTATAAGAGACAG -3'

+

Nextera N/S5xx primer entry point (s5): 5'- TCGTCGGCAGCGTC -3'

+

Nextera N7xx primer entry point (s7): 5'- GTCTCGTGGGCTCGG -3'

+

Illumina P5 Primer: 5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

+

Illumina P7 Primer: 5'- CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG -3'

+

Read 1 sequencing primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

+

Index 1 sequencing primer (i7): 5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'

+

Index 2 sequencing primer (i5): 5'- AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -3'

+

Read 2 seuquencing primer: 5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG -3'

+
+ + +
+ +

Step-by-step library generation

+

(1) Anneal Barcoded RT primer to mRNA in fixed cells and reverse transcription using MMLV in situ:

+
+
+5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](T)30VN---------->
+                                                  (A)n BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -5'
+
+
+ +

(2) Pool all wells, and re-distribute into wells in a new plate, and perform RNaseH and DNA Pol I based second strand synthesis:

+
+
+5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
+3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
+
+
+ +

(3) Add 5ng genomic DNA as carrier, and use Illumina standard Nextera tagmentation on double stranded cDNA plus genomic DNA (will create 9-bp gap):

+Tn5 dimer +
+
+Product 1 (s5 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
+
+5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
+                  TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGACTGCGACGGCTGCT -5'
+
+
+
+Product 2 (s7 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
+
+5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
+                   TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+
+Product 3 (different s5 and s7 at both ends, not amplifiable, due to the use of Illumina P5/P7 Primer, see the next step):
+
+5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
+                  TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+
+Product 4 (s5 at one end, 3' of cDNA at the other end, not amplifiable, due to the use of Illumina P5/P7 Primer, see the next step):
+
+5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT -3'
+3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGACTGCGACGGCTGCT -5'
+
+
+
+Product 5 (s7 at one end, 3' of cDNA at the other end, the only amplifiable product, see the next step):
+
+5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
+3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+
+ +

(4) 72 degree gap fill-in (the first cycle in Nextera PCR):

+
+
+5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXXXXXXXXXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'
+3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+ +

(5) Adding Illumina P5/P7 Primers for library amplification:

+
+
+5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT------>
+                                                5'- ACGACGCTCTTCCGATCT[8-bp UMI][10-bp RT barcode](dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'
+                                                3'- TGCTGCGAGAAGGCTAGA[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+                                                                                                                          <---------GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(6) Final library structure:

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+             Illumina P5              i5     This bit is Truseq adapter     8bp UMI   10bp RT        cDNA             ME              s7           i7        Illumina P7
+                                                                                      barcode
+
+
+ + +

Library sequencing:

+ +

(1) Add read 1 sequencing primer to sequence the first read (bottom strand as template, these are the UMI and RT barcodes, 18 cycles):

+
+
+                                       5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT----------------->
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(2) Add Index 1 sequencing primer to sequence i7 index (bottom strand as template, 10 cycles):

+
+
+                                                                                                         5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC--------->
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(3) Cluster regeneration, add Index 2 sequencing primer to sequence the second index (i5 index) (top strand as template, 10 cycles):

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+                                  <--------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'
+
+
+ +

(4) Add Read 2 sequencing primer to sequence the second read (top strand as template, this is the cDNA read, 52 cycles):

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+                                                                                                       <-----GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+ +
+ +

sci-RNA-seq3

+ +

The sci-RNA-seq3 is an updated version of sci-RNA-seq. The major improvements are:

+

(1) nuclei are extracted directly from fresh tissues without enzymatic treatment;

+

(2) hairpin ligation for the third level indexing (barcoded Tn5 tagmentation was used in the previous version);

+

(3) individually optimised enzymatic reactions;

+

(4) FACS was replaced by dilution, and sonication and filtration steps were added to minimize aggregation.

+ +
+ +

Adapter and primer sequences:

+ +

Barcoded RT primer: 5'- /Phos/CAGAGC[8-bp UMI][10-bp RT barcode]TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN -3'

+

* There are 384 barcoded RT primers, click here to see the full sequence.

+

Barcoded hairpin adapters: 5'- GCTCTG[reverse complement of barcode A]/ddU/ACGACGCTCTTCCGATCT[9-bp or 10-bp barcode A] -3'

+

* There are 384 barcoded hairpin adapters, click here to see the full sequence. The structure of these adapters is like this:

+
+
+         CTTCCGATCT
+        /          NNNNNNNNNN -3'
+        |          NNNNNNNNNNGTCTCG -5'
+        TCGCAGCAddU
+
+
+

Nextera Tn5 binding site (19-bp Mosaic End (ME)): 5'- AGATGTGTATAAGAGACAG -3'

+

Nextera N7xx primer entry point (s7): 5'- GTCTCGTGGGCTCGG -3'

+

PCR P5 Primer: 5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

+

* There are 96 barcoded P5 primers, click here to see the full sequence.

+

PCR P7 Primer: 5'- CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG -3'

+

* There are 96 barcoded P7 primers, click here to see the full sequence.

+

Read 1 sequencing primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

+

Index 1 sequencing primer (i7): 5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3'

+

Index 2 sequencing primer (i5): 5'- AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -3'

+

Read 2 seuquencing primer: 5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG -3'

+
+ + +
+ +

Step-by-step library generation

+

(1) Anneal Barcoded RT primer to mRNA in fixed cells and reverse transcription using MMLV in situ:

+
+
+5'- CAGAGC[8-bp UMI][10-bp RT barcode](T)30VN---------->
+                                      (A)n BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -5'
+
+
+ +

(2) Pool all wells, and re-distribute into wells in a new plate, and ligate barcoded hairpin adapters:

+
+
+ CTTCCGATCT
+/          NNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
+|          NNNNNNNNNNGTCTCG -5'                        (pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
+TCGCAGCAddU
+
+
+ +

(3) Pool all wells again, and re-distribute into wells in a new plate, and perform RNaseH and DNA Pol I based second strand synthesis. DNA Pol I has strand displacement activity, so the hairpin structure is destroyed during the sencond strand synthesis:

+
+
+5'- GCTCTGNNNNNNNNNN/ddU/ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXXXXXXXXXXXXXXXXXXXXXX -3'
+3'- CGAGACNNNNNNNNNN  A  TGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXXXXXXXXXXXXXXXXXXXXXX -5'
+
+
+ +

(4) Perform tagmentation using a Tn5 homodimer with s7-ME oligo (will create 9-bp gap):

+Tn5 dimer +
+
+Product 1 (s7 at both ends, not amplifiable due to the use of Illumina P5/P7 Primer, see the next step):
+
+5'- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGXXXXXXXXXXXX...XXX         CTGTCTCTTATACACATCT
+                   TCTACACATATTCTCTGTC         XXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+
+Product 2 (s7 at one end, 3' of cDNA at the other end, the only amplifiable product, see the next step):
+
+5'- GCTCTGNNNNNNNNNN/ddU/ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
+3'- CGAGACNNNNNNNNNN  A  TGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+
+ +

(5) NEB USER Enzyme treatment to destroy the uracil base (ddU):

+
+
+5'- GCTCTGNNNNNNNNNN ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
+3'- CGAGACNNNNNNNNNNATGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+ +

(6) Adding PCR P5/P7 Primers for library amplification:

+
+
+5'- AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTAC
+                                                    ACGACGCTCTTCCGATCT------>
+                               5'- GCTCTGNNNNNNNNNN ACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGC[8-bp UMI][10-bp RT barcode](dT)VXXX...XXX         CTGTCTCTTATACACATCT -3'
+                               3'- CGAGACNNNNNNNNNNATGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCG[8-bp UMI][10-bp RT barcode](pA)BXXX...XXXXXXXXXXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+                                                                                                                                                  <---------GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(7) Final library structure:

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+             Illumina P5              i5     This bit is Truseq adapter     9bp or 10bp     8bp UMI   10bp RT        cDNA             ME              s7           i7        Illumina P7
+                                                                          hairpin barcode             barcode
+
+
+ + +

Library sequencing:

+ +

(1) Add read 1 sequencing primer to sequence the first read (bottom strand as template, these are the hairpin barcode + GTCTCG + UMI + RT barcodes, 34 cycles):

+
+
+                                       5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT--------------------------------->
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(2) Add Index 1 sequencing primer to sequence i7 index (bottom strand as template, 10 cycles):

+
+
+                                                                                                                         5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC--------->
+3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNGTCTCGNNNNNNNNNNNNNNNNNN(pA)BXXX...XXXXGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
+
+
+ +

(3) Cluster regeneration, add Index 2 sequencing primer to sequence the second index (i5 index) (top strand as template, 10 cycles):

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+                                 <---------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'
+
+
+ +

(4) Add Read 2 sequencing primer to sequence the second read (top strand as template, this is the cDNA read, 52 cycles):

+
+
+5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNCAGAGCNNNNNNNNNNNNNNNNNN(dT)VXXX...XXXXCTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
+                                                                                                                       <-----GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'
+
+
+ + + diff --git a/methods_html/scifi-RNA-seq.html b/methods_html/scifi-RNA-seq.html index d997f36..25afaec 100644 --- a/methods_html/scifi-RNA-seq.html +++ b/methods_html/scifi-RNA-seq.html @@ -8,7 +8,7 @@

scifi-RNA-seq

-

The single-cell combinatorial fluidic indexing RNA-seq (scifi-RNA-seq) uses similar strategy as the sci-RNA-seq, where combinatorial indexing strategy is used. The difference is after in situ reverse transcription with barcoded oligo-dT primers, cells are loaded onto the 10x Chromium system. Cells are overloaded so that >95% of the droplets contain at least one cells. Single cells can be identified by the combination of the RT barcodes and the 10x barcodes. The interesting thing is that the author uses the 10x Chromium scATAC-seq kit for the experiments. This page is basically a recreation of the Supplementary Figure 2 from their manuscript.

+

The single-cell combinatorial fluidic indexing RNA-seq (scifi-RNA-seq) uses similar strategy as the sci-RNA-seq, where combinatorial indexing strategy is used. The difference is after in situ reverse transcription with barcoded oligo-dT primers, cells are loaded onto the 10x Chromium system. Cells are overloaded so that >95% of the droplets contain at least one cells. Single cells can be identified by the combination of the RT barcodes and the 10x barcodes. The interesting thing is that the author uses the 10x Chromium scATAC-seq kit for the experiments. This page is basically a recreation of the Supplementary Figure 2 from their manuscript.