forked from viash-hub/biobox
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * Fasta component * change script resource to samtools_fastq script, with dummy argument to specify the command * add dummy argument to samtools_fastq to share the script with samtools_fasta * fix path to script in config * Update src/samtools/samtools_fastq/script.sh Co-authored-by: Robrecht Cannoodt <[email protected]> * Change default fields to examples * Two more default fields changed to examples * Minor formatting changes * Markdown formatting changes in configs --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
- Loading branch information
1 parent
ec69d9a
commit 7544a75
Showing
13 changed files
with
450 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
name: samtools_fasta | ||
namespace: samtools | ||
description: Converts a SAM, BAM or CRAM to FASTA format. | ||
keywords: [fasta, bam, sam, cram] | ||
links: | ||
homepage: https://www.htslib.org/ | ||
documentation: https://www.htslib.org/doc/samtools-fasta.html | ||
repository: https://github.com/samtools/samtools | ||
references: | ||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008] | ||
license: MIT/Expat | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --input | ||
type: file | ||
description: input SAM/BAM/CRAM file | ||
required: true | ||
- name: Outputs | ||
arguments: | ||
- name: --output | ||
type: file | ||
description: output FASTA file | ||
required: true | ||
direction: output | ||
- name: Options | ||
arguments: | ||
- name: --no_suffix | ||
alternatives: -n | ||
type: boolean_true | ||
description: | | ||
By default, either '/1' or '/2' is added to the end of read names where the corresponding | ||
READ1 or READ2 FLAG bit is set. Using -n causes read names to be left as they are. | ||
- name: --suffix | ||
alternatives: -N | ||
type: boolean_true | ||
description: | | ||
Always add either '/1' or '/2' to the end of read names even when put into different files. | ||
- name: --use_oq | ||
alternatives: -O | ||
type: boolean_true | ||
description: | | ||
Use quality values from OQ tags in preference to standard quality string if available. | ||
- name: --singleton | ||
alternatives: -s | ||
type: file | ||
description: write singleton reads to FILE. | ||
- name: --copy_tags | ||
alternatives: -t | ||
type: boolean_true | ||
description: | | ||
Copy RG, BC and QT tags to the FASTA header line, if they exist. | ||
- name: --copy_tags_list | ||
alternatives: -T | ||
type: string | ||
description: | | ||
Specify a comma-separated list of tags to copy to the FASTA header line, if they exist. | ||
TAGLIST can be blank or `*` to indicate all tags should be copied to the output. If using `*`, | ||
be careful to quote it to avoid unwanted shell expansion. | ||
- name: --read1 | ||
alternatives: -1 | ||
type: file | ||
description: | | ||
Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of outputting them. | ||
If the -s option is used, only paired reads will be written to this file. | ||
direction: output | ||
- name: --read2 | ||
alternatives: -2 | ||
type: file | ||
description: | | ||
Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of outputting them. | ||
If the -s option is used, only paired reads will be written to this file. | ||
direction: output | ||
- name: --output_reads | ||
alternatives: -o | ||
type: file | ||
description: | | ||
Write reads with either READ1 FLAG or READ2 flag set to FILE instead of outputting them to stdout. | ||
This is equivalent to -1 FILE -2 FILE. | ||
direction: output | ||
- name: --output_reads_both | ||
alternatives: -0 | ||
type: file | ||
description: | | ||
Write reads where the READ1 and READ2 FLAG bits set are either both set or both unset to FILE | ||
instead of outputting them. | ||
direction: output | ||
- name: --filter_flags | ||
alternatives: -f | ||
type: integer | ||
description: | | ||
Only output alignments with all bits set in INT present in the FLAG field. INT can be specified | ||
in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' | ||
(i.e. /^0[0-7]+/). Default: `0`. | ||
example: 0 | ||
- name: --excl_flags | ||
alternatives: -F | ||
type: string | ||
description: | | ||
Do not output alignments with any bits set in INT present in the FLAG field. INT can be specified | ||
in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' | ||
(i.e. /^0[0-7]+/). This defaults to 0x900 representing filtering of secondary and | ||
supplementary alignments. Default: `0x900`. | ||
example: "0x900" | ||
- name: --incl_flags | ||
alternatives: --rf | ||
type: string | ||
description: | | ||
Only output alignments with any bits set in INT present in the FLAG field. INT can be specified | ||
in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with '0' | ||
(i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of | ||
flag names. Default: `0`. | ||
example: 0 | ||
- name: --excl_flags_all | ||
alternatives: -G | ||
type: integer | ||
description: | | ||
Only EXCLUDE reads with all of the bits set in INT present in the FLAG field. INT can be specified | ||
in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' (i.e. /^0[0-7]+/). | ||
Default: `0`. | ||
example: 0 | ||
- name: --aux_tag | ||
alternatives: -d | ||
type: string | ||
description: | | ||
Only output alignments containing an auxiliary tag matching both TAG and VAL. If VAL is omitted | ||
then any value is accepted. The tag types supported are i, f, Z, A and H. "B" arrays are not | ||
supported. This is comparable to the method used in samtools view --tag. The option may be specified | ||
multiple times and is equivalent to using the --aux_tag_file option. | ||
- name: --aux_tag_file | ||
alternatives: -D | ||
type: string | ||
description: | | ||
Only output alignments containing an auxiliary tag matching TAG and having a value listed in FILE. | ||
The format of the file is one line per value. This is equivalent to specifying --aux_tag multiple times. | ||
- name: --casava | ||
alternatives: -i | ||
type: boolean_true | ||
description: add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG) | ||
- name: --compression | ||
alternatives: -c | ||
type: integer | ||
description: set compression level when writing gz or bgzf fasta files. | ||
example: 0 | ||
- name: --index1 | ||
alternatives: --i1 | ||
type: file | ||
description: write first index reads to FILE. | ||
- name: --index2 | ||
alternatives: --i2 | ||
type: file | ||
description: write second index reads to FILE. | ||
- name: --barcode_tag | ||
type: string | ||
description: | | ||
Auxiliary tag to find index reads in. Default: `BC`. | ||
example: "BC" | ||
- name: --quality_tag | ||
type: string | ||
description: | | ||
Auxiliary tag to find index quality in. Default: `QT`. | ||
example: "QT" | ||
- name: --index_format | ||
type: string | ||
description: | | ||
string to describe how to parse the barcode and quality tags. For example: | ||
* `i14i8`: the first 14 characters are index 1, the next 8 characters are index 2. | ||
* `n8i14`: ignore the first 8 characters, and use the next 14 characters for index 1. | ||
If the tag contains a separator, then the numeric part can be replaced with`*` to mean | ||
'read until the separator or end of tag', for example: `n*i*`. | ||
resources: | ||
- type: bash_script | ||
path: ../samtools_fastq/script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- type: file | ||
path: test_data | ||
engines: | ||
- type: docker | ||
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 | ||
setup: | ||
- type: docker | ||
run: | | ||
samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | \ | ||
sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
``` | ||
samtools fastq | ||
``` | ||
|
||
Usage: samtools fastq [options...] <in.bam> | ||
|
||
Description: | ||
Converts a SAM, BAM or CRAM to FASTQ format. | ||
|
||
Options: | ||
-0 FILE write reads designated READ_OTHER to FILE | ||
-1 FILE write reads designated READ1 to FILE | ||
-2 FILE write reads designated READ2 to FILE | ||
-o FILE write reads designated READ1 or READ2 to FILE | ||
note: if a singleton file is specified with -s, only | ||
paired reads will be written to the -1 and -2 files. | ||
-d, --tag TAG[:VAL] | ||
only include reads containing TAG, optionally with value VAL | ||
-f, --require-flags INT | ||
only include reads with all of the FLAGs in INT present [0] | ||
-F, --excl[ude]-flags INT | ||
only include reads with none of the FLAGs in INT present [0x900] | ||
--rf, --incl[ude]-flags INT | ||
only include reads with any of the FLAGs in INT present [0] | ||
-G INT only EXCLUDE reads with all of the FLAGs in INT present [0] | ||
-n don't append /1 and /2 to the read name | ||
-N always append /1 and /2 to the read name | ||
-O output quality in the OQ tag if present | ||
-s FILE write singleton reads designated READ1 or READ2 to FILE | ||
-t copy RG, BC and QT tags to the FASTQ header line | ||
-T TAGLIST copy arbitrary tags to the FASTQ header line, '*' for all | ||
-v INT default quality score if not given in file [1] | ||
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG) | ||
-c INT compression level [0..9] to use when writing bgzf files [1] | ||
--i1 FILE write first index reads to FILE | ||
--i2 FILE write second index reads to FILE | ||
--barcode-tag TAG | ||
Barcode tag [BC] | ||
--quality-tag TAG | ||
Quality tag [QT] | ||
--index-format STR | ||
How to parse barcode and quality tags | ||
|
||
--input-fmt-option OPT[=VAL] | ||
Specify a single input file format option in the form | ||
of OPTION or OPTION=VALUE | ||
--reference FILE | ||
Reference sequence FASTA FILE [null] | ||
-@, --threads INT | ||
Number of additional threads to use [0] | ||
--verbosity INT | ||
Set level of verbosity | ||
|
||
The files will be automatically compressed if the file names have a .gz | ||
or .bgzf extension. The input to this program must be collated by name. | ||
Run 'samtools collate' or 'samtools sort -n' to achieve this. | ||
|
||
Reads are designated READ1 if FLAG READ1 is set and READ2 is not set. | ||
Reads are designated READ2 if FLAG READ1 is not set and READ2 is set. | ||
Otherwise reads are designated READ_OTHER (both flags set or both flags unset). | ||
Run 'samtools flags' for more information on flag codes and meanings. | ||
|
||
The index-format string describes how to parse the barcode and quality tags. | ||
It is made up of 'i' or 'n' followed by a length or '*'. For example: | ||
i14i8 The first 14 characters are index 1, the next 8 are index 2 | ||
n8i14 Ignore the first 8 characters, and use the next 14 for index 1 | ||
|
||
If the tag contains a separator, then the numeric part can be replaced with | ||
'*' to mean 'read until the separator or end of tag', for example: | ||
i*i* Break the tag at the separator into index 1 and index 2 | ||
n*i* Ignore the left part of the tag until the separator, | ||
then use the second part of the tag as index 1 | ||
|
||
Examples: | ||
To get just the paired reads in separate files, use: | ||
samtools fastq -1 pair1.fq -2 pair2.fq -0 /dev/null -s /dev/null -n in.bam | ||
|
||
To get all non-supplementary/secondary reads in a single file, redirect | ||
the output: | ||
samtools fastq in.bam > all_reads.fq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
#!/bin/bash | ||
|
||
test_dir="${meta_resources_dir}/test_data" | ||
out_dir="${meta_resources_dir}/out_data" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 1: Convert all reads from a bam file to fasta format" | ||
"$meta_executable" \ | ||
--input "$test_dir/a.bam" \ | ||
--output "$out_dir/a.fa" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$out_dir/a.fa" ] && echo "Output file a.fa is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" | ||
diff "$out_dir/a.fa" "$test_dir/a.fa" || | ||
(echo "Output file a.fa does not match expected output" && exit 1) | ||
|
||
rm "$out_dir/a.fa" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 2: Convert all reads from a sam file to fasta format" | ||
"$meta_executable" \ | ||
--input "$test_dir/a.sam" \ | ||
--output "$out_dir/a.fa" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$out_dir/a.fa" ] && echo "Output file a.fa is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" | ||
diff "$out_dir/a.fa" "$test_dir/a.fa" || | ||
(echo "Output file a.fa does not match expected output" && exit 1) | ||
|
||
rm "$out_dir/a.fa" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 3: Output reads from bam file to separate files" | ||
|
||
"$meta_executable" \ | ||
--input "$test_dir/a.bam" \ | ||
--read1 "$out_dir/a.1.fa" \ | ||
--read2 "$out_dir/a.2.fa" \ | ||
--output "$out_dir/a.fa" | ||
|
||
echo ">>> Check if output files exist" | ||
[ ! -f "$out_dir/a.1.fa" ] && echo "Output file a.1.fa does not exist" && exit 1 | ||
[ ! -f "$out_dir/a.2.fa" ] && echo "Output file a.2.fa does not exist" && exit 1 | ||
[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1 | ||
|
||
echo ">>> Check if output files are empty" | ||
[ ! -s "$out_dir/a.1.fa" ] && echo "Output file a.1.fa is empty" && exit 1 | ||
[ ! -s "$out_dir/a.2.fa" ] && echo "Output file a.2.fa is empty" && exit 1 | ||
# output should be empty since input has no singleton reads | ||
|
||
echo ">>> Check if output files match expected output" | ||
diff "$out_dir/a.1.fa" "$test_dir/a.1.fa" || | ||
(echo "Output file a.1.fa does not match expected output" && exit 1) | ||
diff "$out_dir/a.2.fa" "$test_dir/a.2.fa" || | ||
(echo "Output file a.2.fa does not match expected output" && exit 1) | ||
|
||
rm "$out_dir/a.1.fa" "$out_dir/a.2.fa" "$out_dir/a.fa" | ||
|
||
############################################################################################ | ||
|
||
echo ">>> Test 4: Output only forward reads from bam file to fasta format" | ||
|
||
"$meta_executable" \ | ||
--input "$test_dir/a.sam" \ | ||
--excl_flags "0x80" \ | ||
--output "$out_dir/half.fa" | ||
|
||
echo ">>> Check if output file exists" | ||
[ ! -f "$out_dir/half.fa" ] && echo "Output file half.fa does not exist" && exit 1 | ||
|
||
echo ">>> Check if output is empty" | ||
[ ! -s "$out_dir/half.fa" ] && echo "Output file half.fa is empty" && exit 1 | ||
|
||
echo ">>> Check if output matches expected output" | ||
diff "$out_dir/half.fa" "$test_dir/half.fa" || | ||
(echo "Output file half.fa does not match expected output" && exit 1) | ||
|
||
rm "$out_dir/half.fa" | ||
|
||
############################################################################################ | ||
|
||
echo "All tests succeeded!" | ||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
>a1 | ||
AAAAAAAAAA | ||
>b1 | ||
AAAAAAAAAA | ||
>c1 | ||
AAAAAAAAAA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
>a1 | ||
AAAAAAAAAA | ||
>b1 | ||
AAAAAAAAAA | ||
>c1 | ||
AAAAAAAAAA |
Binary file not shown.
Oops, something went wrong.