Skip to content

Latest commit

 

History

History
 
 

test_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Test data for the RNA-Seq workflow

This folder contains a test data set for the RNA-Seq workflow.

Test data generation

The test data set was simulted to represent a small set of genes in a human cell line RNA-Seq sample with the following procedure (thanks to Rob Patro):

  1. The sample ERR188297 was downloaded from ENA (this is an experimental sample from GEUVADIS).

  2. The sample was quantified against the Gencode v38 human transcriptome.

  3. The results were loaded in R with tximport and aggregating to the gene level.

  4. Expressed genes were randomly pulled out until the sum of their estimated read counts exceeded 100,000 (resulting in 66 genes).

  5. All transcripts from these genes were selected to generate the test data transcriptome reference file (582 transcripts).

  6. The estimated transcript level counts were then used to simulate the test data with polyester using simulate_experiment_countmat.

  7. The reads were shuffled (while maintaining the pairing) using bbmap.

  8. Fake quality scores were added to the reads, using bbmap.