Skip to content

Commit

Permalink
Merge main into add-agat_sp_ensembl_output_style
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Aug 13, 2024
2 parents 49cccab + 9fc07f6 commit b1055e4
Show file tree
Hide file tree
Showing 25 changed files with 669 additions and 253 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,18 @@
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).


* `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).

## MINOR CHANGES

* `busco` components: update BUSCO to `5.7.1` (PR #72).

* Update CI to reusable workflow in `viash-io/viash-actions` (PR #86).

* Update several components in order to avoid duplicate code when using `unset` on boolean arguments (PR #133).

* Bump viash to `0.9.0-RC7` (PR #134)

## DOCUMENTATION

* Extend the contributing guidelines (PR #82):
Expand Down
25 changes: 25 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,31 @@ Notes:

* If your tool allows for multiple inputs using a separator other than `;` (which is the default Viash multiple separator), you can substitute these values with a command like: `par_disable_filters=$(echo $par_disable_filters | tr ';' ',')`.

* If you have a lot of boolean variables that you would like to unset when the value is `false`, you can avoid duplicate code by using the following syntax:

```bash
unset_if_false=(
par_argument_1
par_argument_2
par_argument_3
par_argument_4
)
for par in ${unset_if_false[@]}; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
```

this code is equivalent to

```bash
[[ "$par_argument_1" == "false" ]] && unset par_argument_1
[[ "$par_argument_2" == "false" ]] && unset par_argument_2
[[ "$par_argument_3" == "false" ]] && unset par_argument_3
[[ "$par_argument_4" == "false" ]] && unset par_argument_4
```


### Step 12: Create test script

Expand Down
2 changes: 1 addition & 1 deletion _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ links:
issue_tracker: https://github.com/viash-hub/biobox/issues
repository: https://github.com/viash-hub/biobox

viash_version: 0.9.0-RC6
viash_version: 0.9.0-RC7

config_mods: |
.requirements.commands := ['ps']
84 changes: 84 additions & 0 deletions src/agat/agat_convert_embl2gff/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: agat_convert_embl2gff
namespace: agat
description: |
The script takes an EMBL file as input, and will translate it in gff format.
keywords: [gene annotations, GFF conversion]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_embl2gff.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]

argument_groups:
- name: Inputs
arguments:
- name: --embl
description: Input EMBL file that will be read.
type: file
required: true
direction: input
example: input.embl
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out, --outfile, --gff]
description: Output GFF file. If no output file is specified, the output will be written to STDOUT.
type: file
direction: output
required: false
example: output.gff
- name: Arguments
arguments:
- name: --emblmygff3
description: |
Means that the EMBL flat file comes from the EMBLmyGFF3 software. This is an EMBL format dedicated for submission and contains particularity to deal with. This parameter is needed to get a proper sequence id in the GFF3 from an embl made with EMBLmyGFF3.
type: boolean_true
- name: --primary_tag
alternatives: [--pt, -t]
description: |
List of "primary tag". Useful to discard or keep specific features. Multiple tags must be comma-separated.
type: string
multiple: true
required: false
example: [tag1, tag2]
- name: --discard
alternatives: [-d]
description: |
Means that primary tags provided by the option "primary_tag" will be discarded.
type: boolean_true
- name: --keep
alternatives: [-k]
description: |
Means that only primary tags provided by the option "primary_tag" will be kept.
type: boolean_true
- name: --config
alternatives: [-c]
description: |
Input agat config file. By default AGAT takes as input agat_config.yaml file from the working directory if any, otherwise it takes the original agat_config.yaml shipped with AGAT. To get the agat_config.yaml locally type: "agat config --expose". The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
78 changes: 78 additions & 0 deletions src/agat/agat_convert_embl2gff/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
```sh
agat_convert_embl2gff.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_converter_embl2gff.pl

Description:
The script takes an EMBL file as input, and will translate it in gff
format.

Usage:
agat_converter_embl2gff.pl --embl infile.embl [ -o outfile ]

Options:
--embl Input EMBL file that will be read

--emblmygff3
Bolean - Means that the EMBL flat file comes from the EMBLmyGFF3
software. This is an EMBL format dedicated for submission and
contains particularity to deal with. This parameter is needed to
get a proper sequence id in the GFF3 from an embl made with
EMBLmyGFF3.

--primary_tag, --pt, -t
List of "primary tag". Useful to discard or keep specific
features. Multiple tags must be coma-separated.

-d Bolean - Means that primary tags provided by the option
"primary_tag" will be discarded.

-k Bolean - Means that only primary tags provided by the option
"primary_tag" will be kept.

-o, --output, --out, --outfile or --gff
Output GFF file. If no output file is specified, the output will
be written to STDOUT.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

-h or --help
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
23 changes: 23 additions & 0 deletions src/agat/agat_convert_embl2gff/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

## VIASH START
## VIASH END


# unset flags
[[ "$par_emblmygff3" == "false" ]] && unset par_emblmygff3
[[ "$par_discard" == "false" ]] && unset par_discard
[[ "$par_keep" == "false" ]] && unset par_keep

# replace ';' with ','
par_primary_tag=$(echo $par_primary_tag | tr ';' ',')

# run agat_convert_embl2gff
agat_convert_embl2gff.pl \
--embl "$par_embl" \
-o "$par_output" \
${par_emblmygff3:+--emblmygff3} \
${par_primary_tag:+--primary_tag "${par_primary_tag}"} \
${par_discard:+-d} \
${par_keep:+-k} \
${par_config:+--config "${par_config}"}
28 changes: 28 additions & 0 deletions src/agat/agat_convert_embl2gff/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"
out_dir="${meta_resources_dir}/out_data"

echo "> Run $meta_name with test data and --emblmygff3"
"$meta_executable" \
--embl "$test_dir/agat_convert_embl2gff_1.embl" \
--output "$out_dir/output.gff" \
--emblmygff3

echo ">> Checking output"
[ ! -f "$out_dir/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$out_dir/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$out_dir/output.gff" "$test_dir/agat_convert_embl2gff_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
ID patatrac; SV 1; circular; genomic DNA; XXX; PRO; 317941 BP.
XX
AC XXX;
XX
AC * _ERS324955|SC|contig000001
XX
PR Project:PRJEBNNNN;
XX
DE XXX
XX
RN [1]
RP 1-2149
RA XXX;
RT ;
RL Submitted {(DD-MMM-YYYY)} to the INSDC.
XX
FH Key Location/Qualifiers
FH
FT source 1..588788
FT /organism={"scientific organism name"}
FT /mol_type={"in vivo molecule type of sequence"}
XX
SQ Sequence 588788 BP; 101836 A; 193561 C; 192752 G; 100639 T; 0 other;
tgcgtactcg aagagacgcg cccagattat ataagggcgt cgtctcgagg ccgacggcgc 60
gccggcgagt acgcgtgatc cacaacccga agcgaccgtc gggagaccga gggtcgtcga 120
gggtggatac gttcctgcct tcgtgccggg aaacggccga agggaacgtg gcgacctgcg 180
//
ID fdssf; SV 1; circular; genomic DNA; XXX; PRO; 317941 BP.
XX
AC XXX;
XX
AC * _ERS344554
XX
PR Project:PRJEBNNNN;
XX
DE XXX
XX
RN [1]
RP 1-2149
RA XXX;
RT ;
RL Submitted {(DD-MMM-YYYY)} to the INSDC.
XX
FH Key Location/Qualifiers
FH
FT source 1..588788
FT /organism={"scientific organism name"}
FT /mol_type={"in vivo molecule type of sequence"}
XX
SQ Sequence 588788 BP; 101836 A; 193561 C; 192752 G; 100639 T; 0 other;
TTTTTTTTTT aagagacgcg cccagattat ataagggcgt cgtctcgagg ccgacggcgc 60
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
##gff-version 3
ERS324955|SC|contig000001 EMBL/GenBank/SwissProt source 1 588788 . + 1 mol_type={"in vivo molecule type of sequence"};organism={"scientific organism name"}
ERS344554 EMBL/GenBank/SwissProt source 1 588788 . + 1 mol_type={"in vivo molecule type of sequence"};organism={"scientific organism name"}
##FASTA
>ERS324955|SC|contig000001 XXX
TGCGTACTCGAAGAGACGCGCCCAGATTATATAAGGGCGTCGTCTCGAGGCCGACGGCGCGCCGGCGAGTACGCGTGATC
CACAACCCGAAGCGACCGTCGGGAGACCGAGGGTCGTCGAGGGTGGATACGTTCCTGCCTTCGTGCCGGGAAACGGCCGA
AGGGAACGTGGCGACCTGCG
>ERS344554 XXX
TTTTTTTTTTAAGAGACGCGCCCAGATTATATAAGGGCGTCGTCTCGAGGCCGACGGCGC
10 changes: 10 additions & 0 deletions src/agat/agat_convert_embl2gff/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/agat_convert_embl2gff_1.embl src/agat/agat_convert_embl2gff/test_data/agat_convert_embl2gff_1.embl
cp -r /tmp/agat_source/t/scripts_output/out/agat_convert_embl2gff_1.gff src/agat/agat_convert_embl2gff/test_data/agat_convert_embl2gff_1.gff
Loading

0 comments on commit b1055e4

Please sign in to comment.