-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Config file and help.txt file Created: - config.vsh.yaml - help.txt * Added script.sh File added: - script.sh * Created test.sh updates: - changes to config.vsh.yaml - created test.sh - created some test files Problems: - there is some error in config file that is preventing me from running the component and testing * Update on test.sh * update * Bug fixes - config required: false bug * Update test * Update CHANGELOG.md * Improvement on test.sh * Added more test I tried out different option of the command with different fasta and fastq files and different list, but the output does not seem to change. * Update on tests - got unstuck - I need to create a docker image with the lastest version of seqtk - * Bug fixed - removed some test files - fixed bug with the help of Toni - added correct software_versions.txt to config Still Needs: - add one more test to strand aware - fix tab test * Update CHANGELOG.md * Fixed Tabular test bug * Strand Aware Test - implementation of strand aware test - change of format for reg.bed file * Input validation for list file - input validation for name_list parameter * Sugested Changes - removed test_data dir - removed input validation * Added author info * Update CHANGELOG.md * Update theodoro_gasperin.yaml * add newline * add newline * Update src/seqtk/seqtk_subseq/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Version Fix * Update on config * Helper bed.sh * Deleted _helpers * don't forget exit when a test fails --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
- Loading branch information
Showing
6 changed files
with
305 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
name: Theodoro Gasperin Terra Camargo | ||
info: | ||
links: | ||
email: [email protected] | ||
github: tgaspe | ||
linkedin: theodoro-gasperin-terra-camargo | ||
organizations: | ||
- name: Data Intuitive | ||
href: https://www.data-intuitive.com | ||
role: Bioinformatician |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
name: seqtk_subseq | ||
namespace: seqtk | ||
description: | | ||
Extract subsequences from FASTA/Q files. Takes as input a FASTA/Q file and a name.lst (sequence ids file) or a reg.bed (genomic regions file). | ||
keywords: [subseq, FASTA, FASTQ] | ||
links: | ||
repository: https://github.com/lh3/seqtk/tree/v1.4 | ||
license: MIT | ||
authors: | ||
- __merge__: /src/_authors/theodoro_gasperin.yaml | ||
roles: [ author, maintainer ] | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: "--input" | ||
type: file | ||
direction: input | ||
description: The input FASTA/Q file. | ||
required: true | ||
example: input.fa | ||
|
||
- name: "--name_list" | ||
type: file | ||
direction: input | ||
description: | | ||
List of sequence names (name.lst) or genomic regions (reg.bed) to extract. | ||
required: true | ||
example: list.lst | ||
|
||
- name: Outputs | ||
arguments: | ||
- name: "--output" | ||
alternatives: -o | ||
type: file | ||
direction: output | ||
description: The output FASTA/Q file. | ||
required: true | ||
default: output.fa | ||
|
||
- name: Options | ||
arguments: | ||
- name: "--tab" | ||
alternatives: -t | ||
type: boolean_true | ||
description: TAB delimited output. | ||
|
||
- name: "--strand_aware" | ||
alternatives: -s | ||
type: boolean_true | ||
description: Strand aware. | ||
|
||
- name: "--sequence_line_length" | ||
alternatives: -l | ||
type: integer | ||
description: | | ||
Sequence line length of input fasta file. Default: 0. | ||
example: 0 | ||
|
||
|
||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
|
||
engines: | ||
- type: docker | ||
image: quay.io/biocontainers/seqtk:1.4--he4a0461_2 | ||
setup: | ||
- type: docker | ||
run: | | ||
echo $(echo $(seqtk 2>&1) | sed -n 's/.*\(Version: [^ ]*\).*/\1/p') > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
```bash | ||
seqtk subseq | ||
``` | ||
Usage: seqtk subseq [options] <in.fa> <in.bed>|<name.list> | ||
Options: | ||
-t TAB delimited output | ||
-s strand aware | ||
-l INT sequence line length [0] | ||
Note: Use 'samtools faidx' if only a few regions are intended. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/bash | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
[[ "$par_tab" == "false" ]] && unset par_tab | ||
[[ "$par_strand_aware" == "false" ]] && unset par_strand_aware | ||
|
||
seqtk subseq \ | ||
${par_tab:+-t} \ | ||
${par_strand_aware:+-s} \ | ||
${par_sequence_line_length:+-l "$par_sequence_line_length"} \ | ||
"$par_input" \ | ||
"$par_name_list" \ | ||
> "$par_output" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
#!/bin/bash | ||
|
||
# exit on error | ||
set -e | ||
|
||
## VIASH START | ||
meta_executable="target/executable/seqtk/seqtk_subseq" | ||
meta_resources_dir="src/seqtk" | ||
## VIASH END | ||
|
||
# Create directories for tests | ||
echo "Creating Test Data..." | ||
mkdir test_data | ||
|
||
# Create and populate input.fasta | ||
cat > "test_data/input.fasta" <<EOL | ||
>KU562861.1 | ||
GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCA | ||
AGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA | ||
>GU056837.1 | ||
CTAATTTTATTTTTTTATAATAATTATTGGAGGAACTAAAACATTAATGAAATAATAATTATCATAATTA | ||
TTAATTACATATTTATTAGGTATAATATTTAAGGAAAAATATATTTTATGTTAATTGTAATAATTAGAAC | ||
>CP097510.1 | ||
CGATTTAGATCGGTGTAGTCAACACACATCCTCCACTTCCATTAGGCTTCTTGACGAGGACTACATTGAC | ||
AGCCACCGAGGGAACCGACCTCCTCAATGAAGTCAGACGCCAAGAGCCTATCAACTTCCTTCTGCACAGC | ||
>JAMFTS010000002.1 | ||
CCTAAACCCTAAACCCTAAACCCCCTACAAACCTTACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA | ||
ACCCGAAACCCTATACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCAAACCTAATCCCTAAACC | ||
>MH150936.1 | ||
TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTC | ||
AAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG | ||
EOL | ||
|
||
# Update id.list with new entries | ||
cat > "test_data/id.list" <<EOL | ||
KU562861.1 | ||
MH150936.1 | ||
EOL | ||
|
||
# Create and populate reg.bed | ||
cat > "test_data/reg.bed" <<EOL | ||
KU562861.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")+$(echo -e "\n") | ||
MH150936.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")- | ||
EOL | ||
|
||
######################################################################################### | ||
# Run basic test | ||
mkdir test1 | ||
cd test1 | ||
|
||
echo "> Run seqtk_subseq on FASTA/Q file" | ||
"$meta_executable" \ | ||
--input "../test_data/input.fasta" \ | ||
--name_list "../test_data/id.list" \ | ||
--output "sub_sample.fq" | ||
|
||
expected_output_basic=">KU562861.1 | ||
GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCAAGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA | ||
>MH150936.1 | ||
TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTCAAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG" | ||
output_basic=$(cat sub_sample.fq) | ||
|
||
if [ "$output_basic" != "$expected_output_basic" ]; then | ||
echo "Test failed" | ||
echo "Expected:" | ||
echo "$expected_output_basic" | ||
echo "Got:" | ||
echo "$output_basic" | ||
exit 1 | ||
fi | ||
|
||
######################################################################################### | ||
# Run reg.bed as name list input test | ||
cd .. | ||
mkdir test2 | ||
cd test2 | ||
|
||
echo "> Run seqtk_subseq on FASTA/Q file with BED file as name list" | ||
"$meta_executable" \ | ||
--input "../test_data/input.fasta" \ | ||
--name_list "../test_data/reg.bed" \ | ||
--output "sub_sample.fq" | ||
|
||
expected_output_basic=">KU562861.1:11-20 | ||
AGTGTTCGAG | ||
>MH150936.1:11-20 | ||
TGAAAACTTT" | ||
output_basic=$(cat sub_sample.fq) | ||
|
||
if [ "$output_basic" != "$expected_output_basic" ]; then | ||
echo "Test failed" | ||
echo "Expected:" | ||
echo "$expected_output_basic" | ||
echo "Got:" | ||
echo "$output_basic" | ||
exit 1 | ||
fi | ||
|
||
######################################################################################### | ||
# Run tab option output test | ||
cd .. | ||
mkdir test3 | ||
cd test3 | ||
|
||
echo "> Run seqtk_subseq with TAB option" | ||
"$meta_executable" \ | ||
--tab \ | ||
--input "../test_data/input.fasta" \ | ||
--name_list "../test_data/reg.bed" \ | ||
--output "sub_sample.fq" | ||
|
||
expected_output_tabular=$'KU562861.1\t11\tAGTGTTCGAG\nMH150936.1\t11\tTGAAAACTTT' | ||
output_tabular=$(cat sub_sample.fq) | ||
|
||
if [ "$output_tabular" != "$expected_output_tabular" ]; then | ||
echo "Test failed" | ||
echo "Expected:" | ||
echo "$expected_output_tabular" | ||
echo "Got:" | ||
echo "$output_tabular" | ||
exit 1 | ||
fi | ||
|
||
######################################################################################### | ||
# Run line option output test | ||
cd .. | ||
mkdir test4 | ||
cd test4 | ||
|
||
echo "> Run seqtk_subseq with line length option" | ||
"$meta_executable" \ | ||
--sequence_line_length 5 \ | ||
--input "../test_data/input.fasta" \ | ||
--name_list "../test_data/reg.bed" \ | ||
--output "sub_sample.fq" | ||
|
||
expected_output_wrapped=">KU562861.1:11-20 | ||
AGTGT | ||
TCGAG | ||
>MH150936.1:11-20 | ||
TGAAA | ||
ACTTT" | ||
output_wrapped=$(cat sub_sample.fq) | ||
|
||
if [ "$output_wrapped" != "$expected_output_wrapped" ]; then | ||
echo "Test failed" | ||
echo "Expected:" | ||
echo "$expected_output_wrapped" | ||
echo "Got:" | ||
echo "$output_wrapped" | ||
exit 1 | ||
fi | ||
|
||
######################################################################################### | ||
# Run Strand Aware option output test | ||
cd .. | ||
mkdir test5 | ||
cd test5 | ||
|
||
echo "> Run seqtk_subseq with strand aware option" | ||
"$meta_executable" \ | ||
--strand_aware \ | ||
--input "../test_data/input.fasta" \ | ||
--name_list "../test_data/reg.bed" \ | ||
--output "sub_sample.fq" | ||
|
||
expected_output_wrapped=">KU562861.1:11-20 | ||
AGTGTTCGAG | ||
>MH150936.1:11-20 | ||
AAAGTTTTCA" | ||
output_wrapped=$(cat sub_sample.fq) | ||
|
||
if [ "$output_wrapped" != "$expected_output_wrapped" ]; then | ||
echo "Test failed" | ||
echo "Expected:" | ||
echo "$expected_output_wrapped" | ||
echo "Got:" | ||
echo "$output_wrapped" | ||
exit 1 | ||
fi | ||
|
||
echo "All tests succeeded!" |