Skip to content

Commit

Permalink
Add agat sp merge annotations (#106)
Browse files Browse the repository at this point in the history
* add help

* add config

* add test data and expected output + srcipt to fetch them

* add run script and handle multiple inputs

* add test

* update changelog

* fix typo

* add second test

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <[email protected]>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <[email protected]>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <[email protected]>

* Update src/agat/agat_sp_merge_annotations/config.vsh.yaml

Co-authored-by: Dries Schaumont <[email protected]>

* update --config description

* remove unset IFS

* add temporary directory and cleanup on exit

* update clean up on exit function

* add set -eo pipefail to test and script

* fix create temporary directory

* cleanup changelog

* cleanup changelog

* Minor formatting changes

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
Co-authored-by: Dries Schaumont <[email protected]>
Co-authored-by: Emma Rousseau <[email protected]>
  • Loading branch information
4 people authored Oct 26, 2024
1 parent ebbc0d4 commit 11118fb
Show file tree
Hide file tree
Showing 12 changed files with 268 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
- `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
- `agat/agat_sp_filter_feature_from_kill_list`: remove features in a GFF file based on a kill list (PR #105).
- `agat/agat_sp_merge_annotations`: merge different gff annotation files in one (PR #106).
- `agat/agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).

* `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
Expand Down
67 changes: 67 additions & 0 deletions src/agat/agat_sp_merge_annotations/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: agat_sp_merge_annotations
namespace: agat
description: |
Merge different gff annotation files into one. It uses the AGAT parser that takes care of
duplicated names and fixes other oddities met in those files.
keywords: [gene annotations, merge, gff]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_merge_annotations.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
requirements:
commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff
alternatives: [-f]
description: |
Input GTF/GFF file(s).
type: file
multiple: true
required: true
example: input1.gff;input2.gff
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out]
description: Output gff3 file where the gene incriminated will be writen.
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT.
The `--config` option gives you the possibility to use your own AGAT config file (located
elsewhere or named differently).
type: file
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
64 changes: 64 additions & 0 deletions src/agat/agat_sp_merge_annotations/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
```sh
agat_sp_merge_annotations.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_merge_annotations.pl

Description:
This script merge different gff annotation files in one. It uses the
AGAT parser that takes care of duplicated names and fixes other oddities
met in those files.

Usage:
agat_sp_merge_annotations.pl --gff infile1 --gff infile2 --out outFile
agat_sp_merge_annotations.pl --help

Options:
--gff or -f
Input GTF/GFF file(s). You can specify as much file you want
like so: -f file1 -f file2 -f file3

--out, --output or -o
Output gff3 file where the gene incriminated will be write.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

--help or -h
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
19 changes: 19 additions & 0 deletions src/agat/agat_sp_merge_annotations/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# Convert a list of file names to multiple -gff arguments
input_files=""
IFS=";" read -ra file_names <<< "$par_gff"
for file in "${file_names[@]}"; do
input_files+="--gff $file "
done

# run agat_sp_merge_annotations
agat_sp_merge_annotations.pl \
$input_files \
-o "$par_output" \
${par_config:+--config "${par_config}"}
56 changes: 56 additions & 0 deletions src/agat/agat_sp_merge_annotations/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data 1"
"$meta_executable" \
--gff "$test_dir/file1.gff;$test_dir/file2.gff" \
--output "$TMPDIR/output.gff"

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo ">> cleanup"
rm -rf "$TMPDIR/output.gff"

echo "> Run $meta_name with test data 2"
"$meta_executable" \
--gff "$test_dir/fileA.gff;$test_dir/fileB.gff" \
--output "$TMPDIR/output.gff"

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_2.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
##gff-version 3
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;ontology=G0222
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;ontology=G0222;merged_ID=IDmodified-mrna-1;merged_Ontology=G0333;merged_Parent=IDmodified-gene-1
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
##gff-version 3
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;merged_ID=B.t1;merged_Parent=B
14 changes: 14 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file1.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;


12 changes: 12 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file2.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0333;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileA.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileB.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=B;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=B.t1;Parent=B;
15 changes: 15 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file1.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file2.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_1.gff src/agat/agat_sp_merge_annotations/test_data

cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileA.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileB.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_2.gff src/agat/agat_sp_merge_annotations/test_data

0 comments on commit 11118fb

Please sign in to comment.