Skip to content

Commit

Permalink
Rseqc inferexperiment (#158)
Browse files Browse the repository at this point in the history
* initial commit dedup

* Revert "initial commit dedup"

This reverts commit 38f586b.

* full component with two tests

* adjust arg names, container base image, test data size

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
  • Loading branch information
emmarousseau and rcannood authored Oct 26, 2024
1 parent 2f8bf02 commit c3d87f5
Show file tree
Hide file tree
Showing 8 changed files with 184 additions and 1 deletion.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).

* `rseqc`:
- `rseqc/rseqc_inferexperiment`: Infer strandedness from sequencing reads (PR #158).
- `rseqc/bam_stat`: Generate statistics from a bam file (PR #155).

* `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).
Expand All @@ -27,7 +28,6 @@

* `cutadapt`: Fix the the non-functional `action` parameter (PR #161).


## MINOR CHANGES

* `agat_convert_bed2gff`: change type of argument `inflate_off` from `boolean_false` to `boolean_true` (PR #160).
Expand Down
76 changes: 76 additions & 0 deletions src/rseqc/rseqc_inferexperiment/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: "rseqc_inferexperiment"
namespace: "rseqc"
description: |
Infer strandedness from sequencing reads
links:
homepage: https://rseqc.sourceforge.net/
documentation: https://rseqc.sourceforge.net/#infer-experiment-py
issue_tracker: https://github.com/MonashBioinformaticsPlatform/RSeQC/issues
repository: https://github.com/MonashBioinformaticsPlatform/RSeQC
references:
doi: 10.1093/bioinformatics/bts356
license: GPL-3.0
authors:
- __merge__: /src/_authors/emma_rousseau.yaml
roles: [ author, maintainer ]

argument_groups:
- name: "Input"
arguments:
- name: "--input_file"
alternatives: ["-i"]
type: file
required: true
description: input alignment file in BAM or SAM format
- name: "--refgene"
alternatives: ["-r"]
type: file
required: true
description: Reference gene model in bed format

- name: "Output"
arguments:
- name: "--output"
type: file
direction: output
required: true
description: Output file (txt) of strandness report.
example: $id.strandedness.txt

- name: "Options"
arguments:
- name: "--sample_size"
alternatives: ["-s"]
type: integer
description: |
Number of reads sampled from SAM/BAM file. Default: 200000
example: 200000
- name: "--mapq"
alternatives: ["-q"]
type: integer
description: |
Minimum mapping quality (phred scaled) to determine uniquely mapped reads. Default: 30
example: 30

resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh
- path: test_data

engines:
- type: docker
image: python:3.10
setup:
- type: python
packages: [ RSeQC ]
- type: docker
run: |
echo "RSeQC - infer_experiment.py: $(infer_experiment.py --version | cut -d' ' -f2)" > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
21 changes: 21 additions & 0 deletions src/rseqc/rseqc_inferexperiment/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
```
infer_eperiment.py --help
```

Usage: infer_experiment.py [options]


Options:
--version show program's version number and exit
-h, --help show this help message and exit
-i INPUT_FILE, --input-file=INPUT_FILE
Input alignment file in SAM or BAM format
-r REFGENE_BED, --refgene=REFGENE_BED
Reference gene model in bed fomat.
-s SAMPLE_SIZE, --sample-size=SAMPLE_SIZE
Number of reads sampled from SAM/BAM file.
default=200000
-q MAP_QUAL, --mapq=MAP_QUAL
Minimum mapping quality (phred scaled) for an
alignment to be considered as "uniquely mapped".
default=30
10 changes: 10 additions & 0 deletions src/rseqc/rseqc_inferexperiment/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

set -eo pipefail

infer_experiment.py \
-i $par_input_file \
-r $par_refgene \
${par_sample_size:+-s "${par_sample_size}"} \
${par_mapq:+-q "${par_mapq}"} \
> $par_output
72 changes: 72 additions & 0 deletions src/rseqc/rseqc_inferexperiment/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/bin/bash

# define input and output for script
input_bam="$meta_resources_dir/test_data/sample.bam"
input_bed="$meta_resources_dir/test_data/test.bed12"
output="strandedness.txt"

echo ">>> Prepare test output data"

cat > "$meta_resources_dir/test_data/strandedness.txt" <<EOF
This is PairEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 1.0000
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0000
EOF

cat > "$meta_resources_dir/test_data/strandedness2.txt" <<EOF
Unknown Data type
EOF

################################################################################
# run executable and tests

echo ">>> Test 1: Test with default parameters"

"$meta_executable" \
--input_file "$input_bam" \
--refgene "$input_bed" \
--output "$output"

exit_code=$?
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1

echo ">> Checking whether output can be found and has content"

[ ! -f "$output" ] && echo "$output is missing" && exit 1
[ ! -s "$output" ] && echo "$output is empty" && exit 1


echo ">> Checking whether output is correct"
diff "$output" "$meta_resources_dir/test_data/strandedness.txt" || { echo "Output is not correct"; exit 1; }

rm "$output"

################################################################################

echo ">>> Test 2: Test with non-default sample size and map quality"

"$meta_executable" \
--input_file "$input_bam" \
--refgene "$input_bed" \
--output "$output" \
--sample_size 150000 \
--mapq 90

exit_code=$?
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1

echo ">> Checking whether output can be found and has content"

[ ! -f "$output" ] && echo "$output is missing" && exit 1
[ ! -s "$output" ] && echo "$output is empty" && exit 1

echo ">> Checking whether output is correct"
diff "$output" "$meta_resources_dir/test_data/strandedness2.txt" || { echo "Output is not correct"; exit 1; }


echo "All tests passed"

exit 0
Binary file not shown.
4 changes: 4 additions & 0 deletions src/rseqc/rseqc_inferexperiment/test_data/test.bed12
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MT192765.1 1242 1264 nCoV-2019_5_LEFT 1 + 1242 1264 0 2 10,12, 0,10,
MT192765.1 1573 1595 nCoV-2019_6_LEFT 2 + 1573 1595 0 2 7,15, 0,7,
MT192765.1 1623 1651 nCoV-2019_5_RIGHT 1 - 1623 1651 0 2 14,14, 0,14,
MT192765.1 1942 1964 nCoV-2019_6_RIGHT 2 - 1942 1964 0 2 11,11 0,11,
Binary file not shown.

0 comments on commit c3d87f5

Please sign in to comment.