Skip to content

Commit

Permalink
reorganised the data folder
Browse files Browse the repository at this point in the history
  • Loading branch information
dbrg77 committed Mar 8, 2024
1 parent 8eaf6be commit adf45d3
Show file tree
Hide file tree
Showing 114 changed files with 164 additions and 164 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added data/HyDrop/elife-73971-supp1-v4.docx
Binary file not shown.
Binary file added data/HyDrop/elife-73971-supp2-v4.docx
Binary file not shown.
Binary file added data/HyDrop/elife-73971-supp3-v4.docx
Binary file not shown.
Binary file added data/MARS-seq/41596_2019_164_MOESM4_ESM.xlsx
Binary file not shown.
Binary file added data/MARS-seq/41596_2019_164_MOESM5_ESM.xlsx
Binary file not shown.
Binary file added data/MARS-seq/jaitin-sm.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
Binary file not shown.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
12 changes: 6 additions & 6 deletions docs/source/ge/inDrop.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ wget -P mereu2020/indrop -c \

## Prepare Whitelist

The full oligo sequences can be found in the [Supplementary Table S2](https://teichlab.github.io/scg_lib_structs/data/41596_2017_BFnprot2016154_MOESM456_ESM.xlsx) and [Supplementary Table S3](https://teichlab.github.io/scg_lib_structs/data/41596_2017_BFnprot2016154_MOESM457_ESM.xlsx) from the [__inDrop Nature Protocols paper__](https://www.nature.com/articles/nprot.2016.154). As you can see, there are a total of 384 different __Barcode1__ with 8 - 11 bp in length, and 384 different __Barcode2__. The oligos are added to the gel beads by primer extension during the split-pool procedures. The cell barcodes are basically the combination of __Barcode1__ and __Barcode2__. There will be a total of __384 * 384 = 147456__ possible cell barcodes. I have organised the oligo information into two tables here (only showing 5 records):
The full oligo sequences can be found in the [Supplementary Table S2](https://teichlab.github.io/scg_lib_structs/data/inDrop/41596_2017_BFnprot2016154_MOESM456_ESM.xlsx) and [Supplementary Table S3](https://teichlab.github.io/scg_lib_structs/data/inDrop/41596_2017_BFnprot2016154_MOESM457_ESM.xlsx) from the [__inDrop Nature Protocols paper__](https://www.nature.com/articles/nprot.2016.154). As you can see, there are a total of 384 different __Barcode1__ with 8 - 11 bp in length, and 384 different __Barcode2__. The oligos are added to the gel beads by primer extension during the split-pool procedures. The cell barcodes are basically the combination of __Barcode1__ and __Barcode2__. There will be a total of __384 * 384 = 147456__ possible cell barcodes. I have organised the oligo information into two tables here (only showing 5 records):

__Barcode1__

Expand All @@ -235,15 +235,15 @@ __Barcode2__

__You should notice that `Barcode1` has variable lengths, but the first 8 bp are exactly the same as `Barcode2`__. I have prepared the full tables in `csv` format for you to download:

[inDrop_Barcode1.csv](https://teichlab.github.io/scg_lib_structs/data/inDrop_Barcode1.csv)
[inDrop_Barcode2.csv](https://teichlab.github.io/scg_lib_structs/data/inDrop_Barcode2.csv)
[inDrop_Barcode1.csv](https://teichlab.github.io/scg_lib_structs/data/inDrop/inDrop_Barcode1.csv)
[inDrop_Barcode2.csv](https://teichlab.github.io/scg_lib_structs/data/inDrop/inDrop_Barcode2.csv)

Let's download them to generate the whitelist:

```console
wget -P mereu2020/indrop \
https://teichlab.github.io/scg_lib_structs/data/inDrop_Barcode1.csv \
https://teichlab.github.io/scg_lib_structs/data/inDrop_Barcode2.csv
https://teichlab.github.io/scg_lib_structs/data/inDrop/inDrop_Barcode1.csv \
https://teichlab.github.io/scg_lib_structs/data/inDrop/inDrop_Barcode2.csv
```

Now we need to generate the whitelist of those two sets of barcodes. Read very carefully of the [__inDrop GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/inDrop.html). Pay attention to the oligo orientation. The barcode sequences that we get from the [__inDrop Nature Protocols paper__](https://www.nature.com/articles/nprot.2016.154) are the sequences in the adaptors, which are used to generate the bead oligos. Therefore, the sequences on the bead oligos are reverse complement to the actual barcodes. Now, you can see that in the __V1__ and __V2__ configuration, __Barcode1__ and __Barcode2__ are in the same read and in the same direction of the bead oligo. Therefore, we should use the reverse complement of the barcode sequences for the whitelists. In the __V3__ configuration, __Barcode1__ is sequenced in the opposite direction of the bead oligo with only 8 cycles, so we need to use the first 8 bp of __Barcode1__ as they are. __Barcode2__ is sequenced in the same direction of the bead oligo, so we should take the reverse complement of the barcode sequence. In addition, since we stitch __Barcode1__, __Barcode2__ and __UMI__ together into the `CB_UMI.fastq.gz`, we should generate all possible combinations of the __Barcode1_8bp + Barcode2 rc__ as the whitelist. Here is how you could do this:
Expand Down Expand Up @@ -331,7 +331,7 @@ If you understand the __inDrop__ experimental procedures described in [this GitH

>> These options specify the locations of cell barcode and UMI in the 2nd fastq files we passed to `--readFilesIn`. In this case, it is __Read 2__. Read the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) for more details. I have drawn a picture to help myself decide the exact parameters. There are some freedom here depending on what you are using as anchors. in __inDrop V1 & 2__, the __Barcode1__ has variable lengths, the absolute positions of __Barcode2__ and __UMI__ are variable. Therefore, using Read start as anchor will not work for them. We need to use the adaptor as the anchor, and specify the positions relative to the anchor. See the image:
![](https://teichlab.github.io/scg_lib_structs/data/Star_CB_UMI_Complex_inDrop.jpg)
![](https://teichlab.github.io/scg_lib_structs/data/inDrop/Star_CB_UMI_Complex_inDrop.jpg)

`--soloCBwhitelist`

Expand Down
12 changes: 6 additions & 6 deletions docs/source/multi/Paired-seq.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ After that, we are ready to begin the preprocessing.

## Prepare Whitelist

Cells are barcoded for the first time by either barcoded (3 bp) Tn5 for DNA or oligo-dT primers, followed by three more rounds of ligation. Each round will add 7-bp __Ligation Barcode__ to the molecules. There are 96 different __Ligation Barcodes__ in each round. The same set of 96 __Ligation Barcodes__ are used in each round. Single cells can be identified by the combination of themselves. Here is the information from the [Supplementary Table 2](https://teichlab.github.io/scg_lib_structs/data/41594_2019_323_MOESM3_ESM.xlsx) from the [__Paired-seq__ paper](https://www.nature.com/articles/s41594-019-0323-x):
Cells are barcoded for the first time by either barcoded (3 bp) Tn5 for DNA or oligo-dT primers, followed by three more rounds of ligation. Each round will add 7-bp __Ligation Barcode__ to the molecules. There are 96 different __Ligation Barcodes__ in each round. The same set of 96 __Ligation Barcodes__ are used in each round. Single cells can be identified by the combination of themselves. Here is the information from the [Supplementary Table 2](https://teichlab.github.io/scg_lib_structs/data/Paired-seq/41594_2019_323_MOESM3_ESM.xlsx) from the [__Paired-seq__ paper](https://www.nature.com/articles/s41594-019-0323-x):

__Round 1 barcodes (eight 3-bp Tn5 or oligo-dT barcodes)__

Expand Down Expand Up @@ -304,15 +304,15 @@ __Round 2, 3 and 4 barcodes (7 bp)__

I have prepared the above two tables as `csv` files for you, and you can download them:

[paired-seq_bc01.csv](https://teichlab.github.io/scg_lib_structs/data/paired-seq_bc01.csv)
[paired-seq_bc02-03-04.csv](https://teichlab.github.io/scg_lib_structs/data/paired-seq_bc02-03-04.csv)
[paired-seq_bc01.csv](https://teichlab.github.io/scg_lib_structs/data/Paired-seq/paired-seq_bc01.csv)
[paired-seq_bc02-03-04.csv](https://teichlab.github.io/scg_lib_structs/data/Paired-seq/paired-seq_bc02-03-04.csv)

Since during each ligation round, the same set of __Ligation Barcodes__ (96) are used. Therefore, the whitelist is basically the combination of those 96 barcodes themselves for three times and with those 8 barcodes in the first round: a total of __96 * 96 * 96 * 8 = 7,077,888__ barcodes. Since the barcodes are sequenced as __Read 2__, which uses the top strand as the template, we should use the barcode sequences as they are to construct the whitelist. In addition, the order of the barcodes in __Read 2__ is Round 4 -> Round 3 -> Round 2 -> Round 1. Therefore, we need to generate the whitelist in this order. Again, if you are confused, check the [Paired-seq GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/Paired-seq.html).

```bash
# download the barcode files
wget -P paired-seq/data https://teichlab.github.io/scg_lib_structs/data/paired-seq_bc01.csv \
https://teichlab.github.io/scg_lib_structs/data/paired-seq_bc02-03-04.csv
wget -P paired-seq/data https://teichlab.github.io/scg_lib_structs/data/Paired-seq/paired-seq_bc01.csv \
https://teichlab.github.io/scg_lib_structs/data/Paired-seq/paired-seq_bc02-03-04.csv

# generate whitelist for chromap
for w in $(tail -n +2 paired-seq/data/paired-seq_bc02-03-04.csv | cut -f 2 -d,); do
Expand Down Expand Up @@ -424,7 +424,7 @@ If you understand the __Paired-seq__ experimental procedures described in [this

>> These options specify the locations of cell barcode and UMI in the 2nd fastq files we passed to `--readFilesIn`. In this case, it is __Read 2__. Read the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) for more details. I have drawn a picture to help myself decide the exact parameters. There are some freedom here depending on what you are using as anchors. Due to the 3 random bases in the middle, using Read start as anchor will not work for the barcodes in the middle. We need to use the adapter as the anchor, and specify the positions relative to the anchor. See the image:
![](https://teichlab.github.io/scg_lib_structs/data/Star_CB_UMI_Complex_Paired-seq.jpg)
![](https://teichlab.github.io/scg_lib_structs/data/Paired-seq/Star_CB_UMI_Complex_Paired-seq.jpg)

`--soloCBwhitelist`

Expand Down
8 changes: 4 additions & 4 deletions docs/source/multi/SHARE-seq.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The 2nd read (__i7__) has the following information, which will be used to ident
|--------|---------------------------------------------------------------------------------------------------------------------|
| 99 bp | TCGGACGATCATGGG + 8 bp `LB` + CAAGTATGCAGCGCGCTCAAGCACGTGGAT + 8 bp `LB` AGTCGTACGCCGATGCGAAACATCGGCCAC + 8 bp `LB` |

After sequencing, you need to run `bcl2fastq` by yourself with a `SampleSheet.csv`. Here is an example of `SampleSheet.csv` of a NextSeq run with two different samples using the `Ad1.09` and `Ad1.17` primers from the [Supplementary Table 1](https://teichlab.github.io/scg_lib_structs/data/1-s2.0-S0092867420312538-mmc1.xlsx) from the [__SHARE-seq__ paper](https://www.sciencedirect.com/science/article/pii/S0092867420312538) as sample and modality index:
After sequencing, you need to run `bcl2fastq` by yourself with a `SampleSheet.csv`. Here is an example of `SampleSheet.csv` of a NextSeq run with two different samples using the `Ad1.09` and `Ad1.17` primers from the [Supplementary Table 1](https://teichlab.github.io/scg_lib_structs/data/SHARE-seq/1-s2.0-S0092867420312538-mmc1.xlsx) from the [__SHARE-seq__ paper](https://www.sciencedirect.com/science/article/pii/S0092867420312538) as sample and modality index:

```text
[Header],,,,,,,,,,,
Expand Down Expand Up @@ -186,7 +186,7 @@ After that, we are ready to begin the preprocessing.

## Prepare Whitelist

There are three rounds of ligation. Each round will add 8-bp __Ligation Barcode__ to the molecules. There are 96 different __Ligation Barcodes__ in each round. The same set of 96 __Ligation Barcodes__ are used in each round. Single cells can be identified by the combination of themselves. Here is the information from the [Supplementary Table 1](https://teichlab.github.io/scg_lib_structs/data/1-s2.0-S0092867420312538-mmc1.xlsx) from the [__SHARE-seq__ paper](https://www.sciencedirect.com/science/article/pii/S0092867420312538):
There are three rounds of ligation. Each round will add 8-bp __Ligation Barcode__ to the molecules. There are 96 different __Ligation Barcodes__ in each round. The same set of 96 __Ligation Barcodes__ are used in each round. Single cells can be identified by the combination of themselves. Here is the information from the [Supplementary Table 1](https://teichlab.github.io/scg_lib_structs/data/SHARE-seq/1-s2.0-S0092867420312538-mmc1.xlsx) from the [__SHARE-seq__ paper](https://www.sciencedirect.com/science/article/pii/S0092867420312538):

| WellPosition | Name | Sequence | Reverse complement |
|--------------|---------------|----------|:------------------:|
Expand Down Expand Up @@ -287,11 +287,11 @@ There are three rounds of ligation. Each round will add 8-bp __Ligation Barcode_
| G12 | Round1/2/3_95 | GATGAATC | GATTCATC |
| H12 | Round1/2/3_96 | GCCAAGAC | GTCTTGGC |

Since during each ligation round, the same set of __Ligation Barcodes__ (96) are used. Therefore, the whitelist is basically the combination of those 96 barcodes themselves for three times: a total of __96 * 96 * 96 = 884736__ barcodes. Since the barcodes are sequenced as the `i7` index, which uses the bottom strand as the template, we should use the reverse complement to construct the whitelist. Again, if you are confused, check the [SHARE-seq GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/SHARE-seq.html). I have put the above table into a `csv` file so that you can download by [__click here__](https://teichlab.github.io/scg_lib_structs/data/share-seq_ligationBC.csv).
Since during each ligation round, the same set of __Ligation Barcodes__ (96) are used. Therefore, the whitelist is basically the combination of those 96 barcodes themselves for three times: a total of __96 * 96 * 96 = 884736__ barcodes. Since the barcodes are sequenced as the `i7` index, which uses the bottom strand as the template, we should use the reverse complement to construct the whitelist. Again, if you are confused, check the [SHARE-seq GitHub page](https://teichlab.github.io/scg_lib_structs/methods_html/SHARE-seq.html). I have put the above table into a `csv` file so that you can download by [__click here__](https://teichlab.github.io/scg_lib_structs/data/SHARE-seq/share-seq_ligationBC.csv).

```bash
# download the ligation barcode file
wget -P share-seq/data https://teichlab.github.io/scg_lib_structs/data/share-seq_ligationBC.csv
wget -P share-seq/data https://teichlab.github.io/scg_lib_structs/data/SHARE-seq/share-seq_ligationBC.csv

# generate whitelist
for x in $(tail -n +2 share-seq/data/share-seq_ligationBC.csv | cut -f 4 -d,); do
Expand Down
4 changes: 2 additions & 2 deletions methods_html/BD_Rhapsody.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

<h1><a href="https://www.nature.com/articles/s41592-018-0255-0" target="_blank">BD Rhapsody WTA</a></h1>

<p><span style="font-size:1.1em;">BD Rhapsody WTA is a nanowell-based commercial system like <a href="https://teichlab.github.io/scg_lib_structs/methods_html/Microwell-seq.html" target="_blank">Microwell-seq</a>. They use similar split-pool approach to generate their oligos on magnetic beads. The first publication (V1 version) is in <i>Nature Methods</i> 16, 75-78 (2019). In 2022, BD introduced a new version of beads , called Enhanced Beads, which have slightly different oilgo design. The barcodes are the same, but the linker sequences are much shorter than the V1 version. Note: I don't have the full sequence details from each step of the protocol. Sequences presented here are based on educational guess from the sequencing data. The final library structure should be accurate though. Click <a href="../data/GMX_BD-Rhapsody-Single-Cell-Analysis-System-Instrument_UG_EN.pdf" target="_blank">here</a> for the guide to use the machine, and click <a href="../data/GMX_BD-Rhapsody-WTA-alpha-Protocol_UG_EN.pdf" target="_blank">here</a> to see the off-machine protocol.</span></p>
<p><info>BD Rhapsody WTA is a nanowell-based commercial system like <a href="https://teichlab.github.io/scg_lib_structs/methods_html/Microwell-seq.html" target="_blank">Microwell-seq</a>. They use similar split-pool approach to generate their oligos on magnetic beads. The first publication (V1 version) is in <i>Nature Methods</i> 16, 75-78 (2019). In 2022, BD introduced a new version of beads , called Enhanced Beads, which have slightly different oilgo design. The barcodes are the same, but the linker sequences are much shorter than the V1 version. Note: I don't have the full sequence details from each step of the protocol. Sequences presented here are based on educational guess from the sequencing data. The final library structure should be accurate though. Click <a href="../data/BD/GMX_BD-Rhapsody-Single-Cell-Analysis-System-Instrument_UG_EN.pdf" target="_blank">here</a> for the guide to use the machine, and click <a href="../data/BD/GMX_BD-Rhapsody-WTA-alpha-Protocol_UG_EN.pdf" target="_blank">here</a> to see the off-machine protocol.</info></p>

<br>

Expand All @@ -20,7 +20,7 @@ <h3>Sequence used during the experiment:</h3>
<p><b>Enhanced Bead (introduced in 2022):</b> |--5'-CCCCCCTCTCTCTCT<s5>ACACGACGCTCTTCCGATCT</s5>[VB]<cbc>[CLS1]</cbc><pe1>GTGA</pe1><cbc>[CLS2]</cbc><pe2>GACA</pe2><cbc>[CLS3]</cbc><umi>[8-bp UMI]</umi>(T)<sub>18</sub> -3'</p>
<p>-- Cells are determined as different combination of <cbc>[CLS1]</cbc>, <cbc>[CLS2]</cbc> and <cbc>[CLS3]</cbc>, each is 9-bp long.</p>
<p>-- There are 97 different sequences each, so you have a total of 97x97x97 = 912,673 different combinations.</p>
<p>-- Click here to see the sequences of <a href="../data/BD_CLS1.txt", target="_blank">CLS1</a>, <a href="../data/BD_CLS2.txt", target="_blank">CLS2</a>, and <a href="../data/BD_CLS3.txt", target="_blank">CLS3</a>.</p>
<p>-- Click here to see the sequences of <a href="../data/BD/BD_CLS1.txt", target="_blank">CLS1</a>, <a href="../data/BD/BD_CLS2.txt", target="_blank">CLS2</a>, and <a href="../data/BD/BD_CLS3.txt", target="_blank">CLS3</a>.</p>
<p>-- VB means "Variable Bases" which only exists in the Enhanced Beads. It has four possible bases: <b>None</b>, <b>A</b>, <b>GT</b> or <b>TCA</b>.</p>
<p>Randomer in the kit: 5'- <s7>TCAGACGTGTGCTCTTCCGATCT</s7>NNNNNNNNN -3'</p>
<p>Pre-amp forward primer: 5'- <s5>GACGCTCTTCCGATCT</s5> -3'</p>
Expand Down
4 changes: 2 additions & 2 deletions methods_html/CEL-seq_family.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@

<h1><a href="#CEL-seq" target="_self">CEL-seq</a> / <a href="#CEL-seq2" target="_self">CEL-seq2</a></h1>

<span style="font-size:1.1em;"><p>CEL-seq2 is an improved version of CEL-seq, the main differences are:</p>
<info><p>CEL-seq2 is an improved version of CEL-seq, the main differences are:</p>
<p> (1) CEL-seq2 uses UMI; CEL-seq does not.</p>
<p> (2) CEL-seq2 uses random priming for reverse transcription after IVT amplification; CEL-seq uses RNA adapter ligation and then uses the primer annealed to the ligated adapter for reverse transcription after IVT amplification.</p>
<p>CEL-seq used some homemade oligo sequence design and one of Illumina's kit. The protocol in the publication was not entirely clear, so I guess it was Illumina Truseq Small RNA-seq kit based on the oligo names (In the CEL-seq2 publication, they confirmed this is the case). I still put CEL-seq here to get a historic view of how methods evolve.</p></span>
<p>CEL-seq used some homemade oligo sequence design and one of Illumina's kit. The protocol in the publication was not entirely clear, so I guess it was Illumina Truseq Small RNA-seq kit based on the oligo names (In the CEL-seq2 publication, they confirmed this is the case). I still put CEL-seq here to get a historic view of how methods evolve.</p></info>

<br>

Expand Down
Loading

0 comments on commit adf45d3

Please sign in to comment.