Skip to content

Directory Structure (RNA)

sprokopec edited this page May 15, 2024 · 2 revisions

Directory Structure

PROJECT
├── STAR
│   ├── star_bam_config.yaml
│   ├── date_PROJECTNAME_rnaseqc_output.tsv
│   ├── logs
│   ├── RNASeQC
│   ├── SMP-001
│   |   ├── SMP-001-T_sorted_markdup.bam
│   │   └── SMP-001-T
│   │       └── Aligned.toTranscriptome.out.bam, Aligned.sortedByCoord.out.bam, Chimeric.out.junction
│   └── SMP-002
│       ├── SMP-002-T1
│       └── SMP-002-T2
├── STAR-Fusion
│   ├── date_PROJECTNAME_star-fusion_for_cbioportal.tsv
│   ├── date_PROJECTNAME_star-fusion_output.tsv
│   ├── logs
│   ├── SMP-001
│   │   └── SMP-001-T
│   │       └── star-fusion.fusion_predictions.abridged.tsv
│   └── SMP-002
│       ├── SMP-002-T1
│       └── SMP-002-T2
├── GATK
│   ├── gatk_bam_config.yaml
│   ├── logs
│   ├── SMP-001
│   │   └── SMP-001-T_realigned_recalibrated.bam
│   └── SMP-002
│       ├── SMP-002-T1_realigned_recalibrated.bam
│       └── SMP-002-T2_realigned_recalibrated.bam
└── logs
    └── run_RNA_pipeline_timestamp
        ├── pughlab_rna_pipeline__run_star
        ├── pughlab_rna_pipeline__run_star_fusion
        ├── pughlab_rna_pipeline__run_rsem
        └── pughlab_rna_pipeline__run_gatk

Final outputs

  • star.pl

    • will use collect_rnaseqc_output.R to collect RNASeQC metrics from all processed samples
    • output includes:
      • DATE_projectname_rnaseqc_output.tsv (qc metrics)
      • DATE_projectname_rnaseqc_Pearson_correlations.tsv (sample-sample correlations)
  • rsem.pl

    • will use collect_rsem_output.R to collect expression data from all processed samples
    • output includes gene/isoform x sample matrices:
      • DATE_projectname_gene_expression_TPM.tsv
      • DATE_projectname_mRNA_expression_TPM_for_cbioportal.tsv (RNA expression values in format required by cBioportal. NOT CN/ploidy adjusted!)
      • DATE_projectname_mRNA_TPM_zscores_for_cbioportal.tsv (RNA expression zscores in format required by cBioportal. NOT CN/ploidy adjusted!)
      • DATE_projectname_rsem_expression_results.RData
  • star_fusion.pl

    • will use collect_star-fusion_output.R to collect fusions from all processed samples
    • output includes:
      • DATE_projectname_star-fusion_output_long.tsv (concatenated output)
      • DATE_projectname_star-fusion_output_wide.tsv (fusion x sample matrix)
      • DATE_projectname_star-fusion_for_cbioportal.tsv (SVs in format required by cBioportal)
  • arriba.pl

    • will use collect_arriba_output.R to collect fusions from all processed samples
    • output includes:
      • DATE_projectname_arriba_output_long.tsv (concatenated output)
      • DATE_projectname_arriba_output_wide.tsv (fusion x sample matrix)
      • DATE_projectname_arriba_for_cbioportal.tsv (SVs in format required by cBioportal)
      • DATE_projectname_arriba_viral_counts.tsv (species x sample matrix)
  • fusioncatcher.pl

    • will use collect_fusioncatcher_output.R to collect fusions from all processed samples
    • output includes:
      • DATE_projectname_fusioncatcher_output_long.tsv (concatenated output)
      • DATE_projectname_fusioncatcher_output_wide.tsv (fusion x sample matrix)
      • DATE_projectname_fusioncatcher_for_cbioportal.tsv (SVs in format required by cBioportal)
      • DATE_projectname_fusioncatcher_viral_counts.tsv (species x sample matrix)
  • haplotype_caller.pl

    • will use collect_snv_output.R to collect high-confidence SNV calls from all processed samples
    • output includes:
      • DATE_projectname_variant_by_patient.tsv (snv [chr/pos/ref/alt/gene] x sample matrix)
      • DATE_projectname_gene_by_patient.tsv (gene x sample matrix)
      • DATE_projectname_mutations_for_cbioportal.tsv (SNV and INDEL calls in format required by cBioportal)
  • pughlab_pipeline_auto_report.pl

    • final Report.pdf
    • plots directory containing:
      • qc summary plots and concerns for manual review (qc_concerns.tex)
      • expression landscape plots; SNV summary plots; viral summary plots; fusion summary
      • detailed methods (methods.tex)