Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agat sp merge annotations #106

Merged
merged 24 commits into from
Oct 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
625241e
add help
Leila011 Jul 31, 2024
43dc3eb
add config
Leila011 Jul 31, 2024
5375f59
add test data and expected output + srcipt to fetch them
Leila011 Jul 31, 2024
11d6afa
add run script and handle multiple inputs
Leila011 Jul 31, 2024
7761dd3
add test
Leila011 Jul 31, 2024
5e3b25b
update changelog
Leila011 Jul 31, 2024
ea35b1d
fix typo
Leila011 Jul 31, 2024
1993277
add second test
Leila011 Aug 8, 2024
2fd0bea
Merge main into add-agat_sp_merge_annotations
rcannood Aug 13, 2024
ca292ad
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
3eaf196
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
d4aa71c
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
cfda348
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
29501fb
update --config description
Leila011 Aug 19, 2024
ead79bb
remove unset IFS
Leila011 Aug 19, 2024
188c69e
add temporary directory and cleanup on exit
Leila011 Aug 19, 2024
face09b
update clean up on exit function
Leila011 Aug 19, 2024
b36eb36
add set -eo pipefail to test and script
Leila011 Aug 19, 2024
cffbf33
fix create temporary directory
Leila011 Aug 19, 2024
277765f
cleanup changelog
Leila011 Aug 19, 2024
d7b6a20
cleanup changelog
Leila011 Aug 19, 2024
90db9f1
Minor formatting changes
emmarousseau Oct 26, 2024
de37709
Merge branch 'main' into add-agat_sp_merge_annotations
emmarousseau Oct 26, 2024
e268a10
Merge branch 'main' into add-agat_sp_merge_annotations
rcannood Oct 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- `agat/agat_convert_genscan2gff`: convert a genscan file into a GFF file (PR #100).
- `agat/agat_sp_add_introns`: add intron features to gtf/gff file without intron features (PR #104).
- `agat/agat_sp_filter_feature_from_kill_list`: remove features in a GFF file based on a kill list (PR #105).
- `agat/agat_sp_merge_annotations`: merge different gff annotation files in one (PR #106).
- `agat/agat_sp_statistics`: provides exhaustive statistics of a gft/gff file (PR #107).

* `bd_rhapsody/bd_rhapsody_sequence_analysis`: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).
Expand Down
67 changes: 67 additions & 0 deletions src/agat/agat_sp_merge_annotations/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: agat_sp_merge_annotations
namespace: agat
description: |
Merge different gff annotation files into one. It uses the AGAT parser that takes care of
duplicated names and fixes other oddities met in those files.
keywords: [gene annotations, merge, gff]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_merge_annotations.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
Leila011 marked this conversation as resolved.
Show resolved Hide resolved
requirements:
commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff
alternatives: [-f]
description: |
Input GTF/GFF file(s).
type: file
multiple: true
required: true
example: input1.gff;input2.gff
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out]
description: Output gff3 file where the gene incriminated will be writen.
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT.
The `--config` option gives you the possibility to use your own AGAT config file (located
elsewhere or named differently).
type: file
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
64 changes: 64 additions & 0 deletions src/agat/agat_sp_merge_annotations/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
```sh
agat_sp_merge_annotations.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_merge_annotations.pl

Description:
This script merge different gff annotation files in one. It uses the
AGAT parser that takes care of duplicated names and fixes other oddities
met in those files.

Usage:
agat_sp_merge_annotations.pl --gff infile1 --gff infile2 --out outFile
agat_sp_merge_annotations.pl --help

Options:
--gff or -f
Input GTF/GFF file(s). You can specify as much file you want
like so: -f file1 -f file2 -f file3

--out, --output or -o
Output gff3 file where the gene incriminated will be write.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

--help or -h
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
19 changes: 19 additions & 0 deletions src/agat/agat_sp_merge_annotations/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# Convert a list of file names to multiple -gff arguments
input_files=""
IFS=";" read -ra file_names <<< "$par_gff"
for file in "${file_names[@]}"; do
input_files+="--gff $file "
done

# run agat_sp_merge_annotations
agat_sp_merge_annotations.pl \
$input_files \
-o "$par_output" \
${par_config:+--config "${par_config}"}
56 changes: 56 additions & 0 deletions src/agat/agat_sp_merge_annotations/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data 1"
"$meta_executable" \
--gff "$test_dir/file1.gff;$test_dir/file2.gff" \
--output "$TMPDIR/output.gff"

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo ">> cleanup"
rm -rf "$TMPDIR/output.gff"

echo "> Run $meta_name with test data 2"
"$meta_executable" \
--gff "$test_dir/fileA.gff;$test_dir/fileB.gff" \
--output "$TMPDIR/output.gff"

echo ">> Checking output"
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output.gff" "$test_dir/agat_sp_merge_annotations_2.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
##gff-version 3
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;ontology=G0222
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;ontology=G0222;merged_ID=IDmodified-mrna-1;merged_Ontology=G0333;merged_Parent=IDmodified-gene-1
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
##gff-version 3
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;merged_ID=B.t1;merged_Parent=B
14 changes: 14 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file1.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;


12 changes: 12 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file2.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0333;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileA.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileB.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=B;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=B.t1;Parent=B;
15 changes: 15 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file1.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file2.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_1.gff src/agat/agat_sp_merge_annotations/test_data

cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileA.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileB.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_2.gff src/agat/agat_sp_merge_annotations/test_data