Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rseq bamstat #155

Merged
merged 18 commits into from
Oct 26, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@

* `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).

* `rseqc`:
- `rseqc/bamstat`: Generate statistics from a bam file (PR #155).

## MINOR CHANGES

* Upgrade to Viash 0.9.0.
Expand Down
59 changes: 59 additions & 0 deletions src/rseqc/rseqc_bamstat/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: rseqc_bamstat
namespace: rseqc
keywords: [ rnaseq, genomics ]
description: Generate statistics from a bam file.
links:
homepage: https://rseqc.sourceforge.net/
documentation: https://rseqc.sourceforge.net/#bam-stat-py
issue_tracker: https://github.com/MonashBioinformaticsPlatform/RSeQC/issues
repository: https://github.com/MonashBioinformaticsPlatform/RSeQC
references:
doi: 10.1093/bioinformatics/bts356
license: GPL-3.0
authors:
- __merge__: /src/_authors/emma_rousseau.yaml
roles: [ author, maintainer ]

argument_groups:
- name: "Input"
arguments:
- name: "--input"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the input parameter works I would use the parameter defined in the help page as the input parameter name.

Suggested change
- name: "--input"
- name: "--input_file"

type: file
rcannood marked this conversation as resolved.
Show resolved Hide resolved
required: true
description: Input alignment file in BAM or SAM format.
- name: "--map_qual"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: "--map_qual"
- name: "--mapq"

type: integer
rcannood marked this conversation as resolved.
Show resolved Hide resolved
example: 30
description: |
Minimum mapping quality (phred scaled) to determine uniquely mapped reads. Default: '30'.

- name: "Output"
arguments:
- name: "--output"
type: file
direction: output
description: Output file (txt) with mapping quality statistics.

resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data

engines:
- type: docker
image: ubuntu:22.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
image: ubuntu:22.04
image: python:3.10

setup:
- type: apt
packages: [ python3-pip ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- type: apt
packages: [ python3-pip ]

- type: python
packages: [ RSeQC ]
- type: docker
run: |
echo "RSeQC bam_stat.py: $(bam_stat.py --version | cut -d' ' -f2-)" > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
18 changes: 18 additions & 0 deletions src/rseqc/rseqc_bamstat/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
```
bam_stat.py -h
```

Usage: bam_stat.py [options]

Summarizing mapping statistics of a BAM or SAM file.



Options:
--version show program's version number and exit
-h, --help show this help message and exit
-i INPUT_FILE, --input-file=INPUT_FILE
Alignment file in BAM or SAM format.
-q MAP_QUAL, --mapq=MAP_QUAL
Minimum mapping quality (phred scaled) to determine
"uniquely mapped" reads. default=30
9 changes: 9 additions & 0 deletions src/rseqc/rseqc_bamstat/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash


set -eo pipefail

bam_stat.py \
--input "${par_input}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this works I would use the name used in the help message

Suggested change
--input "${par_input}" \
--input_file "${par_input_file}" \

${par_map_qual:+--mapq "${par_map_qual}"} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${par_map_qual:+--mapq "${par_map_qual}"} \
${par_mapq:+--mapq "${par_mapq}"} \

> $par_output
49 changes: 49 additions & 0 deletions src/rseqc/rseqc_bamstat/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

# define input and output for script

input_bam="test.paired_end.sorted.bam"
output_summary="mapping_quality.txt"

# run executable and tests
echo "> Running $meta_functionality_name."

"$meta_executable" \
--input "$meta_resources_dir/test_data/$input_bam" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--input "$meta_resources_dir/test_data/$input_bam" \
--input_file "$meta_resources_dir/test_data/$input_bam" \

--output "$output_summary"

exit_code=$?
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1

echo ">> Checking whether output is present"
[ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
[ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1

echo ">> Checking whether output is correct"
diff "$meta_resources_dir/test_data/ref_output.txt" "$meta_resources_dir/$output_summary" || { echo "Output is not correct"; exit 1; }

#############################################################################

echo ">>> Test 2: Test with non-default mapping quality threshold"

output_summary="mapping_quality_mapq_30.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_summary="mapping_quality_mapq_30.txt"
output_summary="mapping_quality_mapq_50.txt"


# run executable and tests
echo "> Running $meta_functionality_name."

"$meta_executable" \
--input "$meta_resources_dir/test_data/$input_bam" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--input "$meta_resources_dir/test_data/$input_bam" \
--input_file "$meta_resources_dir/test_data/$input_bam" \

--output "$output_summary" \
--map_qual 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--map_qual 50
--mapq 50


exit_code=$?
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1

echo ">> Checking whether output is present"
[ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
[ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1

echo ">> Checking whether output is correct"
diff "$meta_resources_dir/test_data/ref_output_mapq.txt" "$meta_resources_dir/$output_summary" || { echo "Output is not correct"; exit 1; }

exit 0
22 changes: 22 additions & 0 deletions src/rseqc/rseqc_bamstat/test_data/ref_output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

#==================================================
#All numbers are READ count
#==================================================

Total records: 200

QC failed: 0
Optical/PCR duplicate: 0
Non primary hits 0
Unmapped reads: 3
mapq < mapq_cut (non-unique): 1

mapq >= mapq_cut (unique): 196
Read-1: 99
Read-2: 97
Reads map to '+': 98
Reads map to '-': 98
Non-splice reads: 196
Splice reads: 0
Reads mapped in proper pairs: 192
Proper-paired reads map to different chrom:0
22 changes: 22 additions & 0 deletions src/rseqc/rseqc_bamstat/test_data/ref_output_mapq.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

#==================================================
#All numbers are READ count
#==================================================

Total records: 200

QC failed: 0
Optical/PCR duplicate: 0
Non primary hits 0
Unmapped reads: 3
mapq < mapq_cut (non-unique): 20

mapq >= mapq_cut (unique): 177
Read-1: 88
Read-2: 89
Reads map to '+': 96
Reads map to '-': 81
Non-splice reads: 177
Splice reads: 0
Reads mapped in proper pairs: 175
Proper-paired reads map to different chrom:0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can subsample to half to get the file to about 10kb

Binary file not shown.