-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add agat sp merge annotations (#106)
* add help * add config * add test data and expected output + srcipt to fetch them * add run script and handle multiple inputs * add test * update changelog * fix typo * add second test * Update src/agat/agat_sp_merge_annotations/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> * Update src/agat/agat_sp_merge_annotations/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> * Update src/agat/agat_sp_merge_annotations/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> * Update src/agat/agat_sp_merge_annotations/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> * update --config description * remove unset IFS * add temporary directory and cleanup on exit * update clean up on exit function * add set -eo pipefail to test and script * fix create temporary directory * cleanup changelog * cleanup changelog * Minor formatting changes --------- Co-authored-by: Robrecht Cannoodt <[email protected]> Co-authored-by: Dries Schaumont <[email protected]> Co-authored-by: Emma Rousseau <[email protected]>
- Loading branch information
1 parent
ebbc0d4
commit 11118fb
Showing
12 changed files
with
268 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
name: agat_sp_merge_annotations | ||
namespace: agat | ||
description: | | ||
Merge different gff annotation files into one. It uses the AGAT parser that takes care of | ||
duplicated names and fixes other oddities met in those files. | ||
keywords: [gene annotations, merge, gff] | ||
links: | ||
homepage: https://github.com/NBISweden/AGAT | ||
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_merge_annotations.html | ||
issue_tracker: https://github.com/NBISweden/AGAT/issues | ||
repository: https://github.com/NBISweden/AGAT | ||
references: | ||
doi: 10.5281/zenodo.3552717 | ||
license: GPL-3.0 | ||
requirements: | ||
commands: [agat] | ||
authors: | ||
- __merge__: /src/_authors/leila_paquay.yaml | ||
roles: [ author, maintainer ] | ||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --gff | ||
alternatives: [-f] | ||
description: | | ||
Input GTF/GFF file(s). | ||
type: file | ||
multiple: true | ||
required: true | ||
example: input1.gff;input2.gff | ||
- name: Outputs | ||
arguments: | ||
- name: --output | ||
alternatives: [-o, --out] | ||
description: Output gff3 file where the gene incriminated will be writen. | ||
type: file | ||
direction: output | ||
required: true | ||
example: output.gff | ||
- name: Arguments | ||
arguments: | ||
- name: --config | ||
alternatives: [-c] | ||
description: | | ||
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. | ||
The `--config` option gives you the possibility to use your own AGAT config file (located | ||
elsewhere or named differently). | ||
type: file | ||
example: custom_agat_config.yaml | ||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- type: file | ||
path: test_data | ||
engines: | ||
- type: docker | ||
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0 | ||
setup: | ||
- type: docker | ||
run: | | ||
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
```sh | ||
agat_sp_merge_annotations.pl --help | ||
``` | ||
|
||
------------------------------------------------------------------------------ | ||
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 | | ||
| https://github.com/NBISweden/AGAT | | ||
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se | | ||
------------------------------------------------------------------------------ | ||
|
||
|
||
Name: | ||
agat_sp_merge_annotations.pl | ||
|
||
Description: | ||
This script merge different gff annotation files in one. It uses the | ||
AGAT parser that takes care of duplicated names and fixes other oddities | ||
met in those files. | ||
|
||
Usage: | ||
agat_sp_merge_annotations.pl --gff infile1 --gff infile2 --out outFile | ||
agat_sp_merge_annotations.pl --help | ||
|
||
Options: | ||
--gff or -f | ||
Input GTF/GFF file(s). You can specify as much file you want | ||
like so: -f file1 -f file2 -f file3 | ||
|
||
--out, --output or -o | ||
Output gff3 file where the gene incriminated will be write. | ||
|
||
-c or --config | ||
String - Input agat config file. By default AGAT takes as input | ||
agat_config.yaml file from the working directory if any, | ||
otherwise it takes the orignal agat_config.yaml shipped with | ||
AGAT. To get the agat_config.yaml locally type: "agat config | ||
--expose". The --config option gives you the possibility to use | ||
your own AGAT config file (located elsewhere or named | ||
differently). | ||
|
||
--help or -h | ||
Display this helpful text. | ||
|
||
Feedback: | ||
Did you find a bug?: | ||
Do not hesitate to report bugs to help us keep track of the bugs and | ||
their resolution. Please use the GitHub issue tracking system available | ||
at this address: | ||
|
||
https://github.com/NBISweden/AGAT/issues | ||
|
||
Ensure that the bug was not already reported by searching under Issues. | ||
If you're unable to find an (open) issue addressing the problem, open a new one. | ||
Try as much as possible to include in the issue when relevant: | ||
- a clear description, | ||
- as much relevant information as possible, | ||
- the command used, | ||
- a data sample, | ||
- an explanation of the expected behaviour that is not occurring. | ||
|
||
Do you want to contribute?: | ||
You are very welcome, visit this address for the Contributing | ||
guidelines: | ||
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#!/bin/bash | ||
|
||
set -eo pipefail | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
# Convert a list of file names to multiple -gff arguments | ||
input_files="" | ||
IFS=";" read -ra file_names <<< "$par_gff" | ||
for file in "${file_names[@]}"; do | ||
input_files+="--gff $file " | ||
done | ||
|
||
# run agat_sp_merge_annotations | ||
agat_sp_merge_annotations.pl \ | ||
$input_files \ | ||
-o "$par_output" \ | ||
${par_config:+--config "${par_config}"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#!/bin/bash | ||
|
||
set -eo pipefail | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
test_dir="${meta_resources_dir}/test_data" | ||
|
||
# create temporary directory and clean up on exit | ||
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX") | ||
function clean_up { | ||
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR" | ||
} | ||
trap clean_up EXIT | ||
|
||
echo "> Run $meta_name with test data 1" | ||
"$meta_executable" \ | ||
--gff "$test_dir/file1.gff;$test_dir/file2.gff" \ | ||
--output "$TMPDIR/output.gff" | ||
|
||
echo ">> Checking output" | ||
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1 | ||
|
||
echo ">> Check if output is empty" | ||
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1 | ||
|
||
echo ">> Check if output matches expected output" | ||
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_1.gff" | ||
if [ $? -ne 0 ]; then | ||
echo "Output file output.gff does not match expected output" | ||
exit 1 | ||
fi | ||
|
||
echo ">> cleanup" | ||
rm -rf "$TMPDIR/output.gff" | ||
|
||
echo "> Run $meta_name with test data 2" | ||
"$meta_executable" \ | ||
--gff "$test_dir/fileA.gff;$test_dir/fileB.gff" \ | ||
--output "$TMPDIR/output.gff" | ||
|
||
echo ">> Checking output" | ||
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1 | ||
|
||
echo ">> Check if output is empty" | ||
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1 | ||
|
||
echo ">> Check if output matches expected output" | ||
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_2.gff" | ||
if [ $? -ne 0 ]; then | ||
echo "Output file output.gff does not match expected output" | ||
exit 1 | ||
fi | ||
|
||
echo "> Test successful" |
13 changes: 13 additions & 0 deletions
13
src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_1.gff
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
##gff-version 3 | ||
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;ontology=G0222 | ||
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;ontology=G0222;merged_ID=IDmodified-mrna-1;merged_Ontology=G0333;merged_Parent=IDmodified-gene-1 | ||
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3 | ||
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3 |
3 changes: 3 additions & 0 deletions
3
src/agat/agat_sp_merge_annotations/test_data/agat_sp_merge_annotations_2.gff
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
##gff-version 3 | ||
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A | ||
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;merged_ID=B.t1;merged_Parent=B |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222; | ||
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0222; | ||
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3; | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222; | ||
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0333; | ||
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3; | ||
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A; | ||
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=B; | ||
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=B.t1;Parent=B; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/bash | ||
|
||
# clone repo | ||
if [ ! -d /tmp/agat_source ]; then | ||
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source | ||
fi | ||
|
||
# copy test data | ||
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file1.gff src/agat/agat_sp_merge_annotations/test_data | ||
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file2.gff src/agat/agat_sp_merge_annotations/test_data | ||
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_1.gff src/agat/agat_sp_merge_annotations/test_data | ||
|
||
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileA.gff src/agat/agat_sp_merge_annotations/test_data | ||
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileB.gff src/agat/agat_sp_merge_annotations/test_data | ||
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_2.gff src/agat/agat_sp_merge_annotations/test_data |