Skip to content

Commit

Permalink
Add agat convert genscan2gff (#100)
Browse files Browse the repository at this point in the history
* add config

* add help

* add test data and expected output adn the script to obtain them

* add running script

* add test script

* update changelog

* cleanup

* fix tests

* format description

* remove unused argument --inflate-off

* update --config description

* add requirements

* create temporary directory and clean up on exit

* add GENSCAN in keywords

* add set -e to test

* fix create temporary directory

* add set -eo pipefail to test

* add set -eo pipefail to script

* fix create temporary directory

* update --config description

* cleanup changelog

* cleanup changelog

* Update deprecated variable

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
  • Loading branch information
Leila011 and rcannood authored Sep 16, 2024
1 parent c3b40a1 commit 38f635a
Show file tree
Hide file tree
Showing 8 changed files with 413 additions and 0 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# biobox x.x.x

## NEW FUNCTIONALITY

* `agat`:
- `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).

## MINOR CHANGES

* Upgrade to Viash 0.9.0.
Expand Down
95 changes: 95 additions & 0 deletions src/agat/agat_convert_genscan2gff/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
name: agat_convert_genscan2gff
namespace: agat
description: |
The script takes a GENSCAN file as input, and will translate it in gff
format. The GENSCAN format is described [here](http://genome.crg.es/courses/Bioinformatics2003_genefinding/results/genscan.html).
**Known problem**
You must have submited only DNA sequence, without any header!! Indeed the tool expects only DNA
sequences and does not crash/warn if an header is submited along the
sequence. e.g If you have an header ">seq" s-e-q are seen as the 3 first
nucleotides of the sequence. Then all prediction location are shifted
accordingly. (checked only on the [online version](http://argonaute.mit.edu/GENSCAN.html).
I don't know if there is the same problem elsewhere.)
keywords: [gene annotations, GFF conversion, GENSCAN]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_genscan2gff.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
requirements:
- commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]

argument_groups:
- name: Inputs
arguments:
- name: --genscan
alternatives: [-g]
description: Input genscan bed file that will be converted.
type: file
required: true
direction: input
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out, --outfile, --gff]
description: Output GFF file. If no output file is specified, the output will be written to STDOUT.
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --source
description: |
The source informs about the tool used to produce the data and is stored in 2nd field of a gff file. Example: Stringtie, Maker, Augustus, etc. [default: data]
type: string
required: false
example: Stringtie
- name: --primary_tag
description: |
The primary_tag corresponds to the data type and is stored in 3rd field of a gff file. Example: gene, mRNA, CDS, etc. [default: gene]
type: string
required: false
example: gene
- name: --inflate_type
description: |
Feature type (3rd column in gff) created when inflate parameter activated [default: exon].
type: string
required: false
example: exon
- name: --verbose
description: add verbosity
type: boolean_true
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
94 changes: 94 additions & 0 deletions src/agat/agat_convert_genscan2gff/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
```sh
agat_convert_genscan2gff.pl --help
```
------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------

Name:
agat_convert_genscan2gff.pl

Description:
The script takes a genscan file as input, and will translate it in gff
format. The genscan format is described here:
http://genome.crg.es/courses/Bioinformatics2003_genefinding/results/gens
can.html /!\ vvv Known problem vvv /!\ You must have submited only DNA
sequence, wihtout any header!! Indeed the tool expects only DNA
sequences and does not crash/warn if an header is submited along the
sequence. e.g If you have an header ">seq" s-e-q are seen as the 3 first
nucleotides of the sequence. Then all prediction location are shifted
accordingly. (checked only on the online version
http://argonaute.mit.edu/GENSCAN.html. I don't know if there is the same
pronlem elsewhere.) /!\ ^^^ Known problem ^^^^ /!\

Usage:
agat_convert_genscan2gff.pl --genscan infile.bed [ -o outfile ]
agat_convert_genscan2gff.pl -h

Options:
--genscan or -g
Input genscan bed file that will be convert.

--source
The source informs about the tool used to produce the data and
is stored in 2nd field of a gff file. Example:
Stringtie,Maker,Augustus,etc. [default: data]

--primary_tag
The primary_tag corresponf to the data type and is stored in 3rd
field of a gff file. Example: gene,mRNA,CDS,etc. [default: gene]

--inflate_off
By default we inflate the block fields (blockCount, blockSizes,
blockStarts) to create subfeatures of the main feature
(primary_tag). Type of subfeature created based on the
inflate_type parameter. If you don't want this inflating
behaviour you can deactivate it by using the option
--inflate_off.

--inflate_type
Feature type (3rd column in gff) created when inflate parameter
activated [default: exon].

--verbose
add verbosity

-o , --output , --out , --outfile or --gff
Output GFF file. If no output file is specified, the output will
be written to STDOUT.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

-h or --help
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
21 changes: 21 additions & 0 deletions src/agat/agat_convert_genscan2gff/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# unset flags
[[ "$par_inflate_off" == "true" ]] && unset par_inflate_off
[[ "$par_verbose" == "false" ]] && unset par_verbose

# run agat_convert_genscan2gff
agat_convert_genscan2gff.pl \
--genscan "$par_genscan" \
--output "$par_output" \
${par_source:+--source "${par_source}"} \
${par_primary_tag:+--primary_tag "${par_primary_tag}"} \
${par_inflate_off:+--inflate_off} \
${par_inflate_type:+--inflate_type "${par_inflate_type}"} \
${par_verbose:+--verbose} \
${par_config:+--config "${par_config}"}
35 changes: 35 additions & 0 deletions src/agat/agat_convert_genscan2gff/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data"
"$meta_executable" \
--genscan "$test_dir/test.genscan" \
--output "$TMPDIR/output.gff"

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_convert_genscan2gff_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
##gff-version 3
unknown genscan gene 2223 4605 75.25 + . ID=gene_1
unknown genscan mRNA 2223 4605 75.25 + . ID=mrna_1;Parent=gene_1
unknown genscan exon 2223 3020 75.25 + . ID=exon_1;Parent=mrna_1
unknown genscan exon 4249 4605 13.03 + . ID=exon_2;Parent=mrna_1
unknown genscan CDS 2223 3020 75.25 + 0 ID=cds_1;Parent=mrna_1
unknown genscan CDS 4249 4605 13.03 + 0 ID=cds_2;Parent=mrna_1
unknown genscan gene 6829 8789 20.06 - . ID=gene_2
unknown genscan mRNA 6829 8789 20.06 - . ID=mrna_2;Parent=gene_2
unknown genscan exon 6829 7297 20.06 - . ID=exon_3;Parent=mrna_2
unknown genscan exon 7730 7888 12.78 - . ID=exon_4;Parent=mrna_2
unknown genscan exon 8029 8185 7.45 - . ID=exon_5;Parent=mrna_2
unknown genscan exon 8278 8546 17.45 - . ID=exon_6;Parent=mrna_2
unknown genscan exon 8647 8789 18.65 - . ID=exon_7;Parent=mrna_2
unknown genscan CDS 6829 7297 20.06 - 1 ID=cds_3;Parent=mrna_2
unknown genscan CDS 7730 7888 12.78 - 1 ID=cds_4;Parent=mrna_2
unknown genscan CDS 8029 8185 7.45 - 2 ID=cds_5;Parent=mrna_2
unknown genscan CDS 8278 8546 17.45 - 1 ID=cds_6;Parent=mrna_2
unknown genscan CDS 8647 8789 18.65 - 0 ID=cds_7;Parent=mrna_2
unknown genscan gene 10209 11924 16.18 + . ID=gene_3
unknown genscan mRNA 10209 11924 16.18 + . ID=mrna_3;Parent=gene_3
unknown genscan exon 10209 11313 16.18 + . ID=exon_8;Parent=mrna_3
unknown genscan exon 11850 11924 3.27 + . ID=exon_9;Parent=mrna_3
unknown genscan CDS 10209 11313 16.18 + 0 ID=cds_8;Parent=mrna_3
unknown genscan CDS 11850 11924 3.27 + 2 ID=cds_9;Parent=mrna_3
11 changes: 11 additions & 0 deletions src/agat/agat_convert_genscan2gff/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/test.genscan src/agat/agat_convert_genscan2gff/test_data/test.genscan
cp -r /tmp/agat_source/t/scripts_output/out/agat_convert_genscan2gff_1.gff src/agat/agat_convert_genscan2gff/test_data/agat_convert_genscan2gff_1.gff

Loading

0 comments on commit 38f635a

Please sign in to comment.