Skip to content
/ RPCA Public
forked from bayraktar1/RPCA

RNA-seq pipeline comparison and analyses

License

Notifications You must be signed in to change notification settings

LUMC/RPCA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA-seq Pipeline Comparison and Analyses

A Snakemake workflow for running TALON, FLAIR, and pipeline-nanopore-ref-isoforms. Performs a comparative analyses of results using tools such as GFFcompare.

Flowchart

Dependencies

  1. Snakemake 7.3.1
    A full snakemake instalation is recommended
  2. Singularity 3.7.0

Installation

Clone the repository to desired location.

How to run

  1. Set parameters in config.yaml
  2. run: snakemake -p --use-singularity --singularity-prefix "resources" --singularity-args "--bind *" --use-conda -j ** all --configfile "config/config.yaml"

Note * : You should provide your own directory for the --bind command so that the data is accesible from the singularity containers.
Note ** : Specify number of available threads here.

Snakemake report

You can run snakemake --report report.html AFTER the workflow finished to create a report containing results.

Notes

The GTF files located in the 03_combined and 05_matched_transcripts have a column called TPM. This is actuallu the raw number of counts. The attribute is hijacked to pass counts to GFFCompare.

When testing the workflow it took about 18 hours on 10 threads with 100g memory to process 6 Human samples. Running with a much smaller RNA-virus dataset it took about 8 hours for 6 samples.

The main bottleneck is TranscriptClean which requires many hours and high memory to correct all samples.

Troubleshooting

Transcriptclean

Transcriptclean requires the reference genome Fasta file to only have one string per header. In order to run TranscriptClean you must edit the headers.

Conda environment fails to build

There seems to be an issue with Snakemake 7.3.1 when building conda environments. If a time out error occurs you can try running the workflow with an older version of snakemake such as version 5.3.2.

License

MIT, see LICENSE

About

RNA-seq pipeline comparison and analyses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 62.9%
  • Jupyter Notebook 36.1%
  • Shell 1.0%