Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agat sp add start and stop #122

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
- `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
- `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
- `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
- `agat/agat_sp_add_start_and_stop`: adds start and stop codons when a CDS feature exists (PR #122).

* `bedtools`:
- `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
Expand Down
92 changes: 92 additions & 0 deletions src/agat/agat_sp_add_start_and_stop/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: agat_sp_add_start_and_stop
namespace: agat
description: |
The script adds start and stop codons when a CDS feature exists. The
script looks at the nucleotide sequence and checks the presence of start
and stop codons. The script works even if the start or stop codon are
split over several CDS features.
rcannood marked this conversation as resolved.
Show resolved Hide resolved
keywords: [gene annotations, CDS, GFF]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_add_start_and_stop.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
rcannood marked this conversation as resolved.
Show resolved Hide resolved
license: GPL-3.0
requirements:
commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff
alternatives: [-i, -g]
description: Input GTF/GFF file.
type: file
required: true
direction: input
example: input.gff
- name: --fasta
alternatives: [--fa, -f]
description: Input fasta file. Needed to check that CDS sequences start by start codon and stop by stop codon.
rcannood marked this conversation as resolved.
Show resolved Hide resolved
type: file
required: true
direction: input
example: input.fasta
- name: Outputs
arguments:
- name: --output
alternatives: [--out, -o]
description: Output gff file updated.
rcannood marked this conversation as resolved.
Show resolved Hide resolved
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --codon
alternatives: [--ct, --table]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alternatives: [--ct, --table]
alternatives: [--ct, --table]
min: 0
max: 33

description: Codon table to use. [default 1]
rcannood marked this conversation as resolved.
Show resolved Hide resolved
type: integer
required: false
example: 1
- name: --extend
alternatives: [-e]
description: When no start/stop codon found, try to extend the CDS to meet the next start/stop codon in the sequence.
type: boolean_true
- name: --ni
alternatives: [--na]
description: no iupac / no ambiguous, avoid usage of IUPAC. By default IUPAC is used that means, NNN is seen as start and/or stop codon.
rcannood marked this conversation as resolved.
Show resolved Hide resolved
type: boolean_true
- name: --verbose
alternatives: [-v]
description: Verbose for debugging purpose.
rcannood marked this conversation as resolved.
Show resolved Hide resolved
type: boolean_true
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
82 changes: 82 additions & 0 deletions src/agat/agat_sp_add_start_and_stop/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
```sh
agat_sp_add_start_and_stop.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_add_start_and_stop.pl.pl

Description:
The script adds start and stop codons when a CDS feature exists. The
script looks at the nucleotide sequence and checks the presence of start
and stop codons. The script works even if the start or stop codon are
split over several CDS features.

Usage:
agat_sp_add_start_and_stop.pl.pl --gff infile.gff --fasta genome.fa --out outfile.gff
agat_sp_add_start_and_stop.pl.pl --help

Options:
--gff, -i or -g
Input GTF/GFF file.

--fasta, --fa or -f
Input fasta file. Needed to check that CDS sequences start by
start codon and stop by stop codon.

--ct, --codon or --table
Codon table to use. [default 1]

--out, --output or -o
Output gff file updated

-e or --extend
Boolean - When no start/stop codon found, try to extend the CDS
to meet the next start/stop codon in the sequence.

--ni or --na
Boolean - no iupac / no ambiguous, avoid usage of IUPAC. By
default IUPAC is used that means, NNN is seen as start and/or
stop codon.

-v Verbose for debugging purpose.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

--help or -h
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
22 changes: 22 additions & 0 deletions src/agat/agat_sp_add_start_and_stop/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# unset flags
[[ "$par_ni" == "false" ]] && unset par_ni
[[ "$par_verbose" == "false" ]] && unset par_verbose
[[ "$par_extend" == "false" ]] && unset par_extend

# run agat_sp_add_start_and_stop.pl
agat_sp_add_start_and_stop.pl \
--gff "$par_gff" \
--fasta "$par_fasta" \
--output "$par_output" \
${par_ct:+--ct "${par_ct}"} \
${par_extend:+--extend} \
${par_ni:+--ni} \
${par_verbose:+--v} \
${par_config:+--config "${par_config}"}
63 changes: 63 additions & 0 deletions src/agat/agat_sp_add_start_and_stop/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data and ni flag"
"$meta_executable" \
--gff "$test_dir/agat_sp_add_start_and_stop.gff" \
--fasta "$test_dir/1.fa" \
--output "$TMPDIR/output.gff" \
--ni

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_add_start_and_stop_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

rm -f "$TMPDIR/output.gff"

echo "> Run $meta_name with test data and ni flag"
"$meta_executable" \
--gff "$test_dir/agat_sp_add_start_and_stop.gff" \
--fasta "$test_dir/1.fa" \
--output "$TMPDIR/output.gff" \
--ni \
--extend


echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_add_start_and_stop_2.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

rm -f "$TMPDIR/output.gff"

echo "> Test successful"
Loading
Loading