Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualimap #74

Merged
merged 52 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
9a6730f
first version
dorien-er May 16, 2024
e9dbe7c
complete script for qualimap
dorien-er Jun 7, 2024
6e7d220
add escaping character before leading hashtag (#50)
Leila011 May 17, 2024
87c5cfc
Samtools collate (#49)
emmarousseau May 21, 2024
fdcd972
Update viash (#51)
rcannood May 22, 2024
189d0ab
Samtools view (#48)
emmarousseau May 23, 2024
ade3a3f
Samtools fastq (#52)
emmarousseau May 23, 2024
36762e1
format URL in the description (#55)
Leila011 Jun 3, 2024
529b755
Change name in _viash.yaml (#60)
tverbeiren Jun 19, 2024
a67f6c7
Update operational code (#63)
rcannood Jun 21, 2024
c6ae9ac
update biobox
rcannood Jun 21, 2024
1ece274
cutadapt (#7)
tverbeiren Jun 21, 2024
c81cf5d
update changelog
rcannood Jun 21, 2024
0774e8a
update readme
rcannood Jun 21, 2024
38f89ea
Update salmon quant arguments (#57)
sainirmayi Jun 21, 2024
fb9479b
FEAT: add bedtools getfasta. (#59)
DriesSchaumont Jun 21, 2024
5b26a4b
Add star genomegenerate component (#58)
sainirmayi Jun 24, 2024
e0e4648
fix package config (#65)
rcannood Jun 24, 2024
beac74d
Delete src/bgzip directory (#64)
rcannood Jun 24, 2024
6ea79f1
Output alignments to the transcriptome (#56)
sainirmayi Jun 24, 2024
58a3441
BUG: pear component failure is ignored (#70)
DriesSchaumont Jul 1, 2024
185a748
FEAT + BUG: cutadapt; allowing disabling demultiplexing and fix par_q…
DriesSchaumont Jul 1, 2024
014b569
FEAT: update busco to 5.7.1 (#72)
DriesSchaumont Jul 1, 2024
3211387
Samtools fasta (#53)
emmarousseau Jul 5, 2024
d94004a
Umi tools dedup (#54)
emmarousseau Jul 5, 2024
c6a105f
Fix typo (#79)
rcannood Jul 9, 2024
4f4080a
add vscode to gitignore
rcannood Jul 9, 2024
78fc122
update multiple separator (#81)
dorien-er Jul 11, 2024
199aa31
add test data
dorien-er Jul 11, 2024
1094a35
add tests
dorien-er Jul 11, 2024
ba8a352
update changelog
dorien-er Jul 11, 2024
6701917
Merge branch 'main' into qualimap
dorien-er Jul 11, 2024
6125e74
remove unrequired test data
dorien-er Jul 11, 2024
6a41f49
update descriptions
dorien-er Jul 11, 2024
b4ca197
update changelog
dorien-er Jul 11, 2024
e9f19b0
update help text
dorien-er Jul 11, 2024
58d7dac
Update src/qualimap/qualimap_rnaseq/script.sh
dorien-er Jul 29, 2024
2650674
update unit tests
dorien-er Jul 29, 2024
e6420cd
update unit tests
dorien-er Jul 29, 2024
28cd122
Merge branch 'main' into qualimap
dorien-er Jul 29, 2024
1afe063
addres pr changes request
dorien-er Aug 8, 2024
aed21d0
add version
dorien-er Aug 8, 2024
0ec292a
Merge branch 'main' into qualimap
rcannood Aug 13, 2024
e6a3470
remove whitespace multiqc
dorien-er Aug 20, 2024
44b40e8
Apply suggestions from code review
dorien-er Aug 20, 2024
9489b1b
address pr comments
dorien-er Aug 20, 2024
fed809e
Merge remote-tracking branch 'origin/main' into qualimap
dorien-er Aug 20, 2024
75e5f79
Update CHANGELOG.md
rcannood Aug 20, 2024
d476dee
fix doi
rcannood Aug 20, 2024
1bbebbb
Fix name
rcannood Aug 20, 2024
c407a88
update version and container image
dorien-er Aug 21, 2024
1b0257c
write software version to file
rcannood Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
- `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
- `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).

* `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74).

## MINOR CHANGES

Expand Down
115 changes: 115 additions & 0 deletions src/qualimap/qualimap_rnaseq/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
name: qualimap_rnaseq
namespace: qualimap
keywords: [RNA-seq, quality control, QC Report]
description: |
Qualimap RNA-seq QC reports quality control metrics and bias estimations
which are specific for whole transcriptome sequencing, including reads genomic
origin, junction analysis, transcript coverage and 5’-3’ bias computation.
links:
homepage: http://qualimap.conesalab.org/
documentation: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc
issue_tracker: https://bitbucket.org/kokonech/qualimap/issues?status=new&status=open
repository: https://bitbucket.org/kokonech/qualimap/commits/branch/master
references:
doi: 10.1093/bioinformatics/btv566
license: GPL-2.0
authors:
- __merge__: /src/_authors/dorien_roosen.yaml
roles: [ author, maintainer ]
argument_groups:
- name: "Input"
arguments:
- name: "--bam"
type: file
required: true
example: alignment.bam
description: Path to the sequence alignment file in BAM format, produced by a splicing-aware aligner.
- name: "--gtf"
type: file
required: true
example: annotations.gtf
description: Path to genomic annotations in Ensembl GTF format.

- name: "Output"
arguments:
- name: "--qc_results"
direction: output
type: file
required: true
example: rnaseq_qc_results.txt
description: Text file containing the RNAseq QC results.
- name: "--counts"
type: file
required: false
direction: output
description: Output file for computed counts.
- name: "--report"
type: file
direction: output
required: false
example: report.html
description: Report output file. Supported formats are PDF or HTML.

- name: "Optional"
arguments:
- name: "--num_pr_bases"
type: integer
required: false
min: 1
description: Number of upstream/downstream nucleotide bases to compute 5'-3' bias (default = 100).
- name: "--num_tr_bias"
type: integer
required: false
min: 1
description: Number of top highly expressed transcripts to compute 5'-3' bias (default = 1000).
- name: "--algorithm"
type: string
required: false
choices: ["uniquely-mapped-reads", "proportional"]
description: Counting algorithm (uniquely-mapped-reads (default) or proportional).
- name: "--sequencing_protocol"
type: string
required: false
choices: ["non-strand-specific", "strand-specific-reverse", "strand-specific-forward"]
description: Sequencing library protocol (strand-specific-forward, strand-specific-reverse or non-strand-specific (default)).
- name: "--paired"
type: boolean_true
description: Setting this flag for paired-end experiments will result in counting fragments instead of reads.
- name: "--sorted"
type: boolean_true
description: Setting this flag indicates that the input file is already sorted by name. If flag is not set, additional sorting by name will be performed. Only requiredfor paired-end analysis.
- name: "--java_memory_size"
type: string
required: false
description: maximum Java heap memory size, default = 4G.

resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- path: test_data/

engines:
- type: docker
image: ubuntu:22.04
dorien-er marked this conversation as resolved.
Show resolved Hide resolved
setup:
- type: apt
packages: [ r-base, unzip, wget, openjdk-8-jdk, libxml2-dev, libcurl4-openssl-dev ]
- type: docker
run: |
wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.3.zip && \
unzip qualimap_v2.3.zip && \
cp -a qualimap_v2.3/. usr/bin && \
unset DISPLAY && \
mkdir -p tmp && \
export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp && \
echo "qualimap: v2.3" > /var/software_versions.txt
dorien-er marked this conversation as resolved.
Show resolved Hide resolved

- type: r
bioc: [ NOISeqr ]
cran: [ optparse ]
runners:
- type: executable
- type: nextflow
52 changes: 52 additions & 0 deletions src/qualimap/qualimap_rnaseq/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
QualiMap v.2.3
dorien-er marked this conversation as resolved.
Show resolved Hide resolved
Built on 2023-05-19 16:57

usage: qualimap <tool> [options]

To launch GUI leave <tool> empty.

Available tools:

bamqc Evaluate NGS mapping to a reference genome
rnaseq Evaluate RNA-seq alignment data
counts Counts data analysis (further RNA-seq data evaluation)
multi-bamqc Compare QC reports from multiple NGS mappings
clustering Cluster epigenomic signals
comp-counts Compute feature counts

Special arguments:

--java-mem-size Use this argument to set Java memory heap size. Example:
qualimap bamqc -bam very_large_alignment.bam --java-mem-size=4G

usage: qualimap rnaseq [-a <arg>] -bam <arg> -gtf <arg> [-npb <arg>] [-ntb
<arg>] [-oc <arg>] [-outdir <arg>] [-outfile <arg>] [-outformat <arg>]
[-p <arg>] [-pe] [-s]
-a,--algorithm <arg> Counting algorithm:
uniquely-mapped-reads(default) or
proportional.
-bam <arg> Input mapping file in BAM format.
-gtf <arg> Annotations file in Ensembl GTF format.
-npb,--num-pr-bases <arg> Number of upstream/downstream nucleotide bases
to compute 5'-3' bias (default is 100).
-ntb,--num-tr-bias <arg> Number of top highly expressed transcripts to
compute 5'-3' bias (default is 1000).
-oc <arg> Output file for computed counts. If only name
of the file is provided, then the file will be
saved in the output folder.
-outdir <arg> Output folder for HTML report and raw data.
-outfile <arg> Output file for PDF report (default value is
report.pdf).
-outformat <arg> Format of the output report (PDF, HTML or both
PDF:HTML, default is HTML).
-p,--sequencing-protocol <arg> Sequencing library protocol:
strand-specific-forward,
strand-specific-reverse or non-strand-specific
(default)
-pe,--paired Setting this flag for paired-end experiments
will result in counting fragments instead of
reads
-s,--sorted This flag indicates that the input file is
already sorted by name. If not set, additional
sorting by name will be performed. Only
required for paired-end analysis.
50 changes: 50 additions & 0 deletions src/qualimap/qualimap_rnaseq/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

set -eo pipefail

tmp_dir=$(mktemp -d -p "$meta_temp_dir" qualimap_XXXXXXXXX)

# Handle output parameters
if [ -n "$par_report" ]; then
outfile=$(basename "$par_report")
report_extension="${outfile##*.}"
fi

if [ -n "$par_counts" ]; then
counts=$(basename "$par_counts")
fi

# disable flags
[[ "$par_paired" == "false" ]] && unset par_paired
[[ "$par_sorted" == "false" ]] && unset par_sorted

# Run qualimap
qualimap rnaseq \
${meta_memory_mb:+--java-mem-size=${meta_memory_mb}M} \
${par_algorithm:+--algorithm $par_algorithm} \
${par_sequencing_protocol:+--sequencing-protocol $par_sequencing_protocol} \
-bam $par_bam \
-gtf $par_gtf \
-outdir "$tmp_dir" \
${par_num_pr_bases:+--num-pr-bases $par_num_pr_bases} \
${par_num_tr_bias:+--num-tr-bias $par_num_tr_bias} \
${par_report:+-outformat $report_extension} \
${par_paired:+--paired} \
${par_sorted:+--sorted} \
${par_report:+-outfile "$outfile"} \
${par_counts:+-oc "$counts"}

# Move output files
mv "$tmp_dir/rnaseq_qc_results.txt" "$par_qc_results"

if [ -n "$par_report" ] && [ $report_extension = "html" ]; then
mv "$tmp_dir/qualimapReport.html" "$par_report"
fi

if [ -n "$par_report" ] && [ $report_extension = "pdf" ]; then
mv "$tmp_dir/$outfile" "$par_report"
fi

if [ -n "$par_counts" ]; then
mv "$tmp_dir/$counts" "$par_counts"
fi
112 changes: 112 additions & 0 deletions src/qualimap/qualimap_rnaseq/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
set -e

#############################################
# helper functions
assert_file_exists() {
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
}
assert_file_doesnt_exist() {
[ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
}
assert_file_not_empty() {
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
}
assert_file_contains() {
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
}
#############################################


test_dir="$meta_resources_dir/test_data"

mkdir "run_qualimap_rnaseq_html"
cd "run_qualimap_rnaseq_html"

echo "> Running qualimap with html output report"

"$meta_executable" \
--bam $test_dir/a.bam \
--gtf $test_dir/annotation.gtf \
--report report.html \
--counts counts.txt \
--qc_results output.txt

echo ">> Checking output"
assert_file_exists "report.html"
assert_file_exists "counts.txt"
assert_file_exists "output.txt"
assert_file_doesnt_exist "report.pdf"

echo ">> Checking if output is empty"
assert_file_not_empty "report.html"
assert_file_not_empty "counts.txt"
assert_file_not_empty "output.txt"

echo ">> Checking output contents"
assert_file_contains "output.txt" ">>>>>>> Input"
assert_file_contains "output.txt" ">>>>>>> Reads alignment"
assert_file_contains "output.txt" ">>>>>>> Reads genomic origin"
assert_file_contains "output.txt" ">>>>>>> Transcript coverage profile"
assert_file_contains "output.txt" ">>>>>>> Junction analysis"
assert_file_contains "output.txt" ">>>>>>> Transcript coverage profile"

assert_file_contains "counts.txt" "ENSG00000125841.12"

assert_file_contains "report.html" "<title>Qualimap report: RNA Seq QC</title>"
assert_file_contains "report.html" "<h3>Input</h3>"
assert_file_contains "report.html" "<h3>Reads alignment</h3>"
assert_file_contains "report.html" "<h3>Reads genomic origin</h3>"
assert_file_contains "report.html" "<h3>Transcript coverage profile</h3>"
assert_file_contains "report.html" "<h3>Junction analysis</h3>"


cd ..
rm -r run_qualimap_rnaseq_html

mkdir "run_qualimap_rnaseq_pdf"
cd "run_qualimap_rnaseq_pdf"

echo "> Running qualimap with pdf output report"

"$meta_executable" \
--bam $test_dir/a.bam \
--gtf $test_dir/annotation.gtf \
--report report.pdf \
--counts counts.txt \
--qc_results output.txt

echo ">> Checking output"
assert_file_exists "report.pdf"
assert_file_exists "counts.txt"
assert_file_exists "output.txt"
assert_file_doesnt_exist "report.html"

echo ">> Checking if output is empty"
assert_file_not_empty "report.pdf"
assert_file_not_empty "counts.txt"
assert_file_not_empty "output.txt"

cd ..
rm -r run_qualimap_rnaseq_pdf

mkdir "run_qualimap_rnaseq"
cd "run_qualimap_rnaseq"

echo "> Running qualimap without report and counts output"

"$meta_executable" \
--bam $test_dir/a.bam \
--gtf $test_dir/annotation.gtf \
--qc_results output.txt

echo ">> Checking output"
assert_file_doesnt_exist "report.pdf"
assert_file_doesnt_exist "report.html"
assert_file_doesnt_exist "counts.txt"
assert_file_exists "output.txt"

echo ">> Checking if output is empty"
assert_file_not_empty "output.txt"

cd ..
rm -r run_qualimap_rnaseq
Binary file added src/qualimap/qualimap_rnaseq/test_data/a.bam
Binary file not shown.
10 changes: 10 additions & 0 deletions src/qualimap/qualimap_rnaseq/test_data/annotation.gtf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
chr20 HAVANA transcript 347024 354868 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA exon 347024 347142 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 1; exon_id "ENSE00001831391.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA exon 349249 349363 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 2; exon_id "ENSE00001491647.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA exon 349638 349832 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA CDS 349644 349832 . + 0 gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA start_codon 349644 349646 . + 0 gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 3; exon_id "ENSE00003710328.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA exon 353210 354868 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA CDS 353210 353632 . + 0 gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA stop_codon 353633 353635 . + 0 gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 4; exon_id "ENSE00001822456.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
chr20 HAVANA UTR 347024 347142 . + . gene_id "ENSG00000125841.12"; transcript_id "ENST00000382291.7"; gene_type "protein_coding"; gene_name "NRSN2"; transcript_type "protein_coding"; transcript_name "NRSN2-202"; exon_number 1; exon_id "ENSE00001831391.1"; level 2; protein_id "ENSP00000371728.3"; transcript_support_level "2"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS12996.1"; havana_gene "OTTHUMG00000031628.5"; havana_transcript "OTTHUMT00000077446.1";
10 changes: 10 additions & 0 deletions src/qualimap/qualimap_rnaseq/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# qualimap test data

# Test data was obtained from https://github.com/snakemake/snakemake-wrappers/raw/master/bio/qualimap/rnaseq/test

if [ ! -d /tmp/snakemake-wrappers ]; then
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
fi

cp -r /tmp/snakemake-wrappers/bio/qualimap/rnaseq/test/mapped/a.bam src/qualimap/qualimap_rnaseq/test_data
cp -r /tmp/snakemake-wrappers/bio/qualimap/rnaseq/test/annotation.gtf src/qualimap/qualimap_rnaseq/test_data