Skip to content

Commit

Permalink
Initial commit, config, script, help and test_data
Browse files Browse the repository at this point in the history
  • Loading branch information
emmarousseau committed May 9, 2024
1 parent 897cd89 commit cd118b7
Show file tree
Hide file tree
Showing 6 changed files with 355 additions and 0 deletions.
184 changes: 184 additions & 0 deletions src/samtools/samtools_fastq/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
name: samtools_fastq
namespace: samtools
description: convert a SAM/BAM/CRAM file to FASTQ.
keywords: [fastq, bam, sam, cram]
links:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/samtools-fastq.html
repository: https://github.com/samtools/samtools
references:
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
license: MIT/Expat

argument_groups:
- name: Inputs
arguments:
- name: --input
type: file
description: input SAM/BAM/CRAM file
required: true
- name: Outputs
arguments:
- name: --output
type: file
description: output FASTQ file
required: true
- name: Options
arguments:
- name: --no_suffix
alternatives: -n
type: boolean_true
description: |
By default, either '/1' or '/2' is added to the end of read names where the corresponding
READ1 or READ2 FLAG bit is set. Using -n causes read names to be left as they are.
- name: --suffix
alternatives: -N
type: boolean_true
description: |
Always add either '/1' or '/2' to the end of read names even when put into different files.
- name: --use_oq
alternatives: -O
type: boolean_true
description: |
Use quality values from OQ tags in preference to standard quality string if available.
- name: --singleton
alternatives: -s
type: file
description: write singleton reads to FILE.
- name: --copy_tags
alternatives: -t
type: boolean_true
description: |
Copy RG, BC and QT tags to the FASTQ header line, if they exist.
- name: --copy_tags_list
alternatives: -T
type: string
description: |
Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist.
TAGLIST can be blank or * to indicate all tags should be copied to the output. If using *,
be careful to quote it to avoid unwanted shell expansion.
- name: --read1
alternatives: "-1"
type: file
description: |
Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of outputting them.
If the -s option is used, only paired reads will be written to this file.
- name: --read2
alternatives: "-2"
type: file
description: |
Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of outputting them.
If the -s option is used, only paired reads will be written to this file.
- name: --output_reads
alternatives: -o
type: file
description: |
Write reads with either READ1 FLAG or READ2 flag set to FILE instead of outputting them to stdout.
This is equivalent to -1 FILE -2 FILE.
- name: --output_reads_both
alternatives: -0
type: file
description: |
Write reads where the READ1 and READ2 FLAG bits set are either both set or both unset to FILE
instead of outputting them.
- name: --filter_flags
alternatives: -f
type: integer
description: |
Only output alignments with all bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
(i.e. /^0[0-7]+/).
default: 0
- name: --excl_flags
alternatives: "-F"
type: integer
description: |
Do not output alignments with any bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
(i.e. /^0[0-7]+/). This defaults to 0x900 representing filtering of secondary and
supplementary alignments.
default: 0x900
- name: --incl_flags
alternatives: "--rf"
type: integer
description: |
Only output alignments with any bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0'
(i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of
flag names.
default: 0
- name: --excl_flags_all
alternatives: -G
type: integer
description: |
Only EXCLUDE reads with all of the bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
(i.e. /^0[0-7]+/).
default: 0
- name: --aux_tag
alternatives: -d
type: string
description: |
Only output alignments containing an auxiliary tag matching both TAG and VAL. If VAL is omitted
then any value is accepted. The tag types supported are i, f, Z, A and H. "B" arrays are not
supported. This is comparable to the method used in samtools view --tag. The option may be specified
multiple times and is equivalent to using the --aux_tag_file option.
- name: --aux_tag_file
alternatives: -D
type: string
description: |
Only output alignments containing an auxiliary tag matching TAG and having a value listed in FILE.
The format of the file is one line per value. This is equivalent to specifying --aux_tag multiple times.
- name: --casava
alternatives: -i
type: boolean_true
description: add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
- name: --compression
alternatives: -c
type: integer
description: set compression level when writing gz or bgzf fastq files.
default: 0
- name: --index1
alternatives: --i1
type: file
description: write first index reads to FILE.
- name: --index2
alternatives: --i2
type: file
description: write second index reads to FILE.
- name: --barcode_tag
type: string
description: Auxiliary tag to find index reads in.
default: BC
- name: --quality_tag
type: string
description: Auxiliary tag to find index quality in.
default: QT
- name: --index_format
type: string
description: |
string to describe how to parse the barcode and quality tags. For example:
[i14i8]: the first 14 characters are index 1, the next 8 characters are index 2.
[n8i14]: ignore the first 8 characters, and use the next 14 characters for index 1.
If the tag contains a separator, then the numeric part can be replaced with '*' to mean
'read until the separator or end of tag', for example: [n*i*].
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
setup:
- type: docker
run: |
samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | \
sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
80 changes: 80 additions & 0 deletions src/samtools/samtools_fastq/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
```
samtools fastq
```

Usage: samtools fastq [options...] <in.bam>

Description:
Converts a SAM, BAM or CRAM to FASTQ format.

Options:
-0 FILE write reads designated READ_OTHER to FILE
-1 FILE write reads designated READ1 to FILE
-2 FILE write reads designated READ2 to FILE
-o FILE write reads designated READ1 or READ2 to FILE
note: if a singleton file is specified with -s, only
paired reads will be written to the -1 and -2 files.
-d, --tag TAG[:VAL]
only include reads containing TAG, optionally with value VAL
-f, --require-flags INT
only include reads with all of the FLAGs in INT present [0]
-F, --excl[ude]-flags INT
only include reads with none of the FLAGs in INT present [0x900]
--rf, --incl[ude]-flags INT
only include reads with any of the FLAGs in INT present [0]
-G INT only EXCLUDE reads with all of the FLAGs in INT present [0]
-n don't append /1 and /2 to the read name
-N always append /1 and /2 to the read name
-O output quality in the OQ tag if present
-s FILE write singleton reads designated READ1 or READ2 to FILE
-t copy RG, BC and QT tags to the FASTQ header line
-T TAGLIST copy arbitrary tags to the FASTQ header line, '*' for all
-v INT default quality score if not given in file [1]
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
-c INT compression level [0..9] to use when writing bgzf files [1]
--i1 FILE write first index reads to FILE
--i2 FILE write second index reads to FILE
--barcode-tag TAG
Barcode tag [BC]
--quality-tag TAG
Quality tag [QT]
--index-format STR
How to parse barcode and quality tags

--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--verbosity INT
Set level of verbosity

The files will be automatically compressed if the file names have a .gz
or .bgzf extension. The input to this program must be collated by name.
Run 'samtools collate' or 'samtools sort -n' to achieve this.

Reads are designated READ1 if FLAG READ1 is set and READ2 is not set.
Reads are designated READ2 if FLAG READ1 is not set and READ2 is set.
Otherwise reads are designated READ_OTHER (both flags set or both flags unset).
Run 'samtools flags' for more information on flag codes and meanings.

The index-format string describes how to parse the barcode and quality tags.
It is made up of 'i' or 'n' followed by a length or '*'. For example:
i14i8 The first 14 characters are index 1, the next 8 are index 2
n8i14 Ignore the first 8 characters, and use the next 14 for index 1

If the tag contains a separator, then the numeric part can be replaced with
'*' to mean 'read until the separator or end of tag', for example:
i*i* Break the tag at the separator into index 1 and index 2
n*i* Ignore the left part of the tag until the separator,
then use the second part of the tag as index 1

Examples:
To get just the paired reads in separate files, use:
samtools fastq -1 pair1.fq -2 pair2.fq -0 /dev/null -s /dev/null -n in.bam

To get all non-supplementary/secondary reads in a single file, redirect
the output:
samtools fastq in.bam > all_reads.fq
40 changes: 40 additions & 0 deletions src/samtools/samtools_fastq/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash

## VIASH START
## VIASH END

set -e

[[ "$par_no_suffix" == "false" ]] && unset par_no_suffix
[[ "$par_suffix" == "false" ]] && unset par_suffix
[[ "$par_use_oq" == "false" ]] && unset par_use_oq
[[ "$par_copy_tags" == "false" ]] && unset par_copy_tags
[[ "$par_casava" == "false" ]] && unset par_casava

samtools fastq \
${par_no_suffix:+-n} \
${par_suffix:+-N} \
${par_use_oq:+-O} \
${par_singleton:+-s "$par_singleton"} \
${par_copy_tags:+-t} \
${par_copy_tags_list:+-T "$par_copy_tags_list"} \
${par_read1:+-1 "$par_read1"} \
${par_read2:+-2 "$par_read2"} \
${par_output_reads:+-o "$par_output_reads"} \
${par_output_reads_both:+-0 "$par_output_reads_both"} \
${par_filter_flags:+-f "$par_filter_flags"} \
${par_excl_flags:+-F "$par_excl_flags"} \
${par_incl_flags:+--rf "$par_incl_flags"} \
${par_excl_flags_all:+-G "$par_excl_flags_all"} \
${par_aux_tag:+-d "$par_aux_tag"} \
${par_aux_tag_file:+-D "$par_aux_tag_file"} \
${par_casava:+-i} \
${par_compression:+-c "$par_compression"} \
${par_index1:+--i1 "$par_index1"} \
${par_index2:+--i2 "$par_index2"} \
${par_barcode_tag:+--barcode-tag "$par_barcode_tag"} \
${par_quality_tag:+--quality-tag "$par_quality_tag"} \
${par_index_format:+--index-format "$par_index_format"} \
"$par_input" \
> "$par_output"

36 changes: 36 additions & 0 deletions src/samtools/samtools_fastq/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

test_dir="${meta_resources_dir}/test_data"
out_dir="${meta_resources_dir}/tmp"

############################################################################################

## example 1: samtools fastq -0 /dev/null in_name.bam > all_reads.fq
## example 2: samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
## example 3: samtools fastq with fasta output??
## example 4: samtools fastq with compressed input?
## example 5: samtools fastq with no suffix?


echo ">>> Test 1: Sorting a BAM file"

"$meta_executable" \
--input "$test_dir/a.bam" \
--output "$test_dir/a.sorted.bam"

echo ">>> Check if output file exists"
[ ] \
&& echo "Output file a.sorted.bam does not exist" && exit 1

echo ">>> Check if output is empty"

echo ">>> Check if output matches expected output"


############################################################################################

############################################################################################


echo "All tests succeeded!"
exit 0
7 changes: 7 additions & 0 deletions src/samtools/samtools_fastq/test_data/a.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@SQ SN:xx LN:20
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
a1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
b1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
c1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
8 changes: 8 additions & 0 deletions src/samtools/samtools_fastq/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

# dowload test data from snakemake wrapper
if [ ! -d /tmp/fastq_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers.git /tmp/fastq_source
fi

cp -r /tmp/fastq_source/bio/samtools/fastx/test/*.sam src/samtools/samtools_fastq/test_data/

0 comments on commit cd118b7

Please sign in to comment.