forked from viash-hub/biobox
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial version of samtools sort, no tests * Add tests, final touches * Update changelog * Update src/samtools/samtools_sort/config.vsh.yaml Remove "must_exist: false" since that is the default value Co-authored-by: Robrecht Cannoodt <[email protected]> * Clean up test script, update changelog * Minor changes, paths, config and script --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
- Loading branch information
1 parent
8935a78
commit d3b2053
Showing
13 changed files
with
338 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
name: samtools_sort | ||
namespace: samtools | ||
description: Sort SAM/BAM/CRAM file. | ||
keywords: [sort, bam, sam, cram] | ||
links: | ||
homepage: https://www.htslib.org/ | ||
documentation: https://www.htslib.org/doc/samtools-idxstats.html | ||
repository: https://github.com/samtools/samtools | ||
references: | ||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008] | ||
license: MIT/Expat | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --input | ||
type: file | ||
description: SAM/BAM/CRAM input file. | ||
required: true | ||
must_exist: true | ||
- name: Outputs | ||
arguments: | ||
- name: --output | ||
type: file | ||
description: | | ||
Write final output to file. | ||
required: true | ||
direction: output | ||
example: out.bam | ||
- name: --output_fmt | ||
alternatives: -O | ||
type: string | ||
description: | | ||
Specify output format (SAM, BAM, CRAM). | ||
example: BAM | ||
- name: --output_fmt_option | ||
type: string | ||
description: | | ||
Specify a single output file format option in the form | ||
of OPTION or OPTION=VALUE. | ||
- name: --reference | ||
type: file | ||
description: | | ||
Reference sequence FASTA FILE. | ||
example: ref.fa | ||
- name: --write_index | ||
type: boolean_true | ||
description: | | ||
Automatically index the output files. | ||
- name: --prefix | ||
alternatives: -T | ||
type: string | ||
description: | | ||
Write temporary files to PREFIX.nnnn.bam. | ||
- name: --no_PG | ||
type: boolean_true | ||
description: | | ||
Do not add a PG line. | ||
- name: --template_coordinate | ||
type: boolean_true | ||
description: | | ||
Sort by template-coordinate. | ||
- name: --input_fmt_option | ||
type: string | ||
description: | | ||
Specify a single input file format option in the form | ||
of OPTION or OPTION=VALUE. | ||
- name: Options | ||
arguments: | ||
- name: --compression | ||
alternatives: -l | ||
type: integer | ||
description: | | ||
Set compression level, from 0 (uncompressed) to 9 (best). | ||
default: 0 | ||
- name: --uncompressed | ||
alternatives: -u | ||
type: boolean_true | ||
description: | | ||
Output uncompressed data (equivalent to --compression 0). | ||
- name: --minimiser | ||
alternatives: -M | ||
type: boolean_true | ||
description: | | ||
Use minimiser for clustering unaligned/unplaced reads. | ||
- name: --not_reverse | ||
alternatives: -R | ||
type: boolean_true | ||
description: | | ||
Do not use reverse strand (only compatible with --minimiser) | ||
- name: --kmer_size | ||
alternatives: -K | ||
type: integer | ||
description: | | ||
Kmer size to use for minimiser. | ||
example: 20 | ||
- name: --order | ||
alternatives: -I | ||
type: file | ||
description: | | ||
Order minimisers by their position in FILE FASTA. | ||
example: ref.fa | ||
- name: --window | ||
alternatives: -w | ||
type: integer | ||
description: | | ||
Window size for minimiser INDEXING VIA --order REF.FA. | ||
example: 100 | ||
- name: --homopolymers | ||
alternatives: -H | ||
type: boolean_true | ||
description: | | ||
Squash homopolymers when computing minimiser. | ||
- name: --natural_sort | ||
alternatives: -n | ||
type: boolean_true | ||
description: | | ||
Sort by read name (natural): cannot be used with samtools index. | ||
- name: --ascii_sort | ||
alternatives: -N | ||
type: boolean_true | ||
description: | | ||
Sort by read name (ASCII): cannot be used with samtools index. | ||
- name: --tag | ||
alternatives: -t | ||
type: string | ||
description: | | ||
Sort by value of TAG. Uses position as secondary index | ||
(or read name if --natural_sort is set). | ||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- type: file | ||
path: test_data | ||
engines: | ||
- type: docker | ||
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 | ||
setup: | ||
- type: docker | ||
run: | | ||
samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | \ | ||
sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
``` | ||
samtools sort | ||
``` | ||
|
||
Usage: samtools sort [options...] [in.bam] | ||
Options: | ||
-l INT Set compression level, from 0 (uncompressed) to 9 (best) | ||
-u Output uncompressed data (equivalent to -l 0) | ||
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M] | ||
-M Use minimiser for clustering unaligned/unplaced reads | ||
-R Do not use reverse strand (only compatible with -M) | ||
-K INT Kmer size to use for minimiser [20] | ||
-I FILE Order minimisers by their position in FILE FASTA | ||
-w INT Window size for minimiser indexing via -I ref.fa [100] | ||
-H Squash homopolymers when computing minimiser | ||
-n Sort by read name (natural): cannot be used with samtools index | ||
-N Sort by read name (ASCII): cannot be used with samtools index | ||
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set) | ||
-o FILE Write final output to FILE rather than standard output | ||
-T PREFIX Write temporary files to PREFIX.nnnn.bam | ||
--no-PG | ||
Do not add a PG line | ||
--template-coordinate | ||
Sort by template-coordinate | ||
--input-fmt-option OPT[=VAL] | ||
Specify a single input file format option in the form | ||
of OPTION or OPTION=VALUE | ||
-O, --output-fmt FORMAT[,OPT[=VAL]]... | ||
Specify output format (SAM, BAM, CRAM) | ||
--output-fmt-option OPT[=VAL] | ||
Specify a single output file format option in the form | ||
of OPTION or OPTION=VALUE | ||
--reference FILE | ||
Reference sequence FASTA FILE [null] | ||
-@, --threads INT | ||
Number of additional threads to use [0] | ||
--write-index | ||
Automatically index the output files [off] | ||
--verbosity INT | ||
Set level of verbosity |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
#!/bin/bash | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
set -e | ||
|
||
[[ "$par_uncompressed" == "false" ]] && unset par_uncompressed | ||
[[ "$par_minimiser" == "false" ]] && unset par_minimiser | ||
[[ "$par_not_reverse" == "false" ]] && unset par_not_reverse | ||
[[ "$par_homopolymers" == "false" ]] && unset par_homopolymers | ||
[[ "$par_natural_sort" == "false" ]] && unset par_natural_sort | ||
[[ "$par_ascii_sort" == "false" ]] && unset par_ascii_sort | ||
[[ "$par_template_coordinate" == "false" ]] && unset par_template_coordinate | ||
[[ "$par_write_index" == "false" ]] && unset par_write_index | ||
[[ "$par_no_PG" == "false" ]] && unset par_no_PG | ||
|
||
|
||
samtools sort \ | ||
${par_compression:+-l "$par_compression"} \ | ||
${par_uncompressed:+-u} \ | ||
${par_minimiser:+-M} \ | ||
${par_not_reverse:+-R} \ | ||
${par_kmer_size:+-K "$par_kmer_size"} \ | ||
${par_order:+-I "$par_order"} \ | ||
${par_window:+-w "$par_window"} \ | ||
${par_homopolymers:+-H} \ | ||
${par_natural_sort:+-n} \ | ||
${par_ascii_sort:+-N} \ | ||
${par_tag:+-t "$par_tag"} \ | ||
${par_input_fmt_option:+--input-fmt-option "$par_input_fmt_option"} \ | ||
${par_template_coordinate:+--template-coordinate} \ | ||
${par_write_index:+--write-index} \ | ||
${par_prefix:+-T "$par_prefix"} \ | ||
${par_no_PG:+--no-PG} \ | ||
${par_output_fmt:+-O "$par_output_fmt"} \ | ||
${par_output_fmt_option:+--output-fmt-option "$par_output_fmt_option"} \ | ||
${par_reference:+--reference "$par_reference"} \ | ||
-o "$par_output" \ | ||
"$par_input" | ||
|
||
# save text files containing the output of samtools view for later comparison | ||
samtools view "$par_output" -o "$par_output".txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
#!/bin/bash | ||
|
||
test_dir="${meta_resources_dir}/test_data" | ||
out_dir="${meta_resources_dir}/test_data/text" | ||
|
||
# Files are compared using the "samtools view" output. | ||
############################################################################################ | ||
|
||
echo ">>> Test 1: Sorting a BAM file" | ||
|
||
"$meta_executable" \ | ||
--input "$test_dir/a.bam" \ | ||
--output "$test_dir/a.sorted.bam" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$test_dir/a.sorted.bam" ] \ | ||
&& echo "Output file a.sorted.bam does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$test_dir/a.sorted.bam" ] \ | ||
&& echo "Output file a.sorted.bam is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" | ||
diff -a "$test_dir/a.sorted.bam.txt" "$out_dir/a_ref.sorted.txt" \ | ||
|| (echo "Output file a.sorted.bam does not match expected output" && exit 1) | ||
|
||
rm "$test_dir/a.sorted.bam" "$test_dir/a.sorted.bam.txt" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 2: Sorting a BAM file according to ascii order" | ||
|
||
"$meta_executable" \ | ||
--input "$test_dir/a.bam" \ | ||
--ascii_sort \ | ||
--output "$test_dir/ascii.sorted.bam" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$test_dir/ascii.sorted.bam" ] \ | ||
&& echo "Output file ascii.sorted.bam does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$test_dir/ascii.sorted.bam" ] \ | ||
&& echo "Output file ascii.sorted.bam is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" | ||
diff -a "$test_dir/ascii.sorted.bam.txt" "$out_dir/ascii_ref.sorted.txt" \ | ||
|| (echo "Output file ascii.sorted.bam does not match expected output" && exit 1) | ||
|
||
rm "$test_dir/ascii.sorted.bam" "$test_dir/ascii.sorted.bam.txt" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 3: Sorting a BAM file with compression" | ||
|
||
"$meta_executable" \ | ||
--input "$test_dir/a.bam" \ | ||
--compression 5 \ | ||
--output "$test_dir/compressed.sorted.bam" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$test_dir/compressed.sorted.bam" ] \ | ||
&& echo "Output file compressed.sorted.bam does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$test_dir/compressed.sorted.bam" ] \ | ||
&& echo "Output file compressed.sorted.bam is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" # | ||
diff "$test_dir/compressed.sorted.bam.txt" "$out_dir/compressed_ref.sorted.txt" \ | ||
|| (echo "Output file compressed.sorted.bam does not match expected output" && exit 1) | ||
|
||
rm "$test_dir/compressed.sorted.bam" "$test_dir/compressed.sorted.bam.txt" | ||
|
||
############################################################################################ | ||
|
||
|
||
echo "All tests succeeded!" | ||
exit 0 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+312 Bytes
src/samtools/samtools_sort/test_data/output/compressed_ref.sorted.bam
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/bin/bash | ||
|
||
# dowload test data from snakemake wrapper | ||
if [ ! -d /tmp/idxstats_source ]; then | ||
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers.git /tmp/sort_source | ||
fi | ||
|
||
cp -r /tmp/sort_source/bio/samtools/sort/test/mapped/* src/samtools/samtools_sort/test_data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
a1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
b1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
c1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** |
6 changes: 6 additions & 0 deletions
6
src/samtools/samtools_sort/test_data/text/ascii_ref.sorted.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
a1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
b1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
c1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** |
6 changes: 6 additions & 0 deletions
6
src/samtools/samtools_sort/test_data/text/compressed_ref.sorted.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA ********** | ||
a1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
b1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** | ||
c1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT ********** |