Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agat sp merge annotations #106

Merged
merged 24 commits into from
Oct 26, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
625241e
add help
Leila011 Jul 31, 2024
43dc3eb
add config
Leila011 Jul 31, 2024
5375f59
add test data and expected output + srcipt to fetch them
Leila011 Jul 31, 2024
11d6afa
add run script and handle multiple inputs
Leila011 Jul 31, 2024
7761dd3
add test
Leila011 Jul 31, 2024
5e3b25b
update changelog
Leila011 Jul 31, 2024
ea35b1d
fix typo
Leila011 Jul 31, 2024
1993277
add second test
Leila011 Aug 8, 2024
2fd0bea
Merge main into add-agat_sp_merge_annotations
rcannood Aug 13, 2024
ca292ad
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
3eaf196
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
d4aa71c
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
cfda348
Update src/agat/agat_sp_merge_annotations/config.vsh.yaml
Leila011 Aug 19, 2024
29501fb
update --config description
Leila011 Aug 19, 2024
ead79bb
remove unset IFS
Leila011 Aug 19, 2024
188c69e
add temporary directory and cleanup on exit
Leila011 Aug 19, 2024
face09b
update clean up on exit function
Leila011 Aug 19, 2024
b36eb36
add set -eo pipefail to test and script
Leila011 Aug 19, 2024
cffbf33
fix create temporary directory
Leila011 Aug 19, 2024
277765f
cleanup changelog
Leila011 Aug 19, 2024
d7b6a20
cleanup changelog
Leila011 Aug 19, 2024
90db9f1
Minor formatting changes
emmarousseau Oct 26, 2024
de37709
Merge branch 'main' into add-agat_sp_merge_annotations
emmarousseau Oct 26, 2024
e268a10
Merge branch 'main' into add-agat_sp_merge_annotations
rcannood Oct 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@

* `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).

* `agat/agat_sp_merge_annotations`: merge different gff annotation files in one (PR #106).

## MINOR CHANGES

* `busco` components: update BUSCO to `5.7.1` (PR #72).
Expand Down
72 changes: 72 additions & 0 deletions src/agat/agat_sp_merge_annotations/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: agat_sp_merge_annotations
namespace: agat
description: |
This script merge different gff annotation files in one. It uses the
AGAT parser that takes care of duplicated names and fixes other oddities
met in those files.
Leila011 marked this conversation as resolved.
Show resolved Hide resolved
keywords: [gene annotations]
Leila011 marked this conversation as resolved.
Show resolved Hide resolved
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_merge_annotations.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
Leila011 marked this conversation as resolved.
Show resolved Hide resolved
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff
alternatives: [-f]
description: |
Input GTF/GFF file(s).
type: file
multiple: true
required: true
direction: input
example: input1.gff;input2.gff
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out]
description: Output gff3 file where the gene incriminated will be write.
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --config
alternatives: [-c]
description: |
String - Input AGAT config file. By default, AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the original agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).
Leila011 marked this conversation as resolved.
Show resolved Hide resolved
type: file
required: false
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
64 changes: 64 additions & 0 deletions src/agat/agat_sp_merge_annotations/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
```sh
agat_sp_merge_annotations.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_merge_annotations.pl

Description:
This script merge different gff annotation files in one. It uses the
AGAT parser that takes care of duplicated names and fixes other oddities
met in those files.

Usage:
agat_sp_merge_annotations.pl --gff infile1 --gff infile2 --out outFile
agat_sp_merge_annotations.pl --help

Options:
--gff or -f
Input GTF/GFF file(s). You can specify as much file you want
like so: -f file1 -f file2 -f file3

--out, --output or -o
Output gff3 file where the gene incriminated will be write.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

--help or -h
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
18 changes: 18 additions & 0 deletions src/agat/agat_sp_merge_annotations/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

## VIASH START
## VIASH END

# Convert a list of file names to multiple -gff arguments
input_files=""
IFS=";" read -ra file_names <<< "$par_gff"
for file in "${file_names[@]}"; do
input_files+="--gff $file "
done
unset IFS

# run agat_sp_merge_annotations
agat_sp_merge_annotations.pl \
$input_files \
-o "$par_output" \
${par_config:+--config "${par_config}"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test if the following works?
A couple of things here:

  1. If you set an environment variable as part of a composite command instead of on a separate line, it is only applied for the commands on that line (here IFS is only used by read, so no need to use unset afterwards)
  2. The ${array[@]} syntax can be used to output an array which is seprated by spaces (space is the default separator, this can be adjusted)
Suggested change
input_files=""
IFS=";" read -ra file_names <<< "$par_gff"
for file in "${file_names[@]}"; do
input_files+="--gff $file "
done
unset IFS
# run agat_sp_merge_annotations
agat_sp_merge_annotations.pl \
$input_files \
-o "$par_output" \
${par_config:+--config "${par_config}"}
IFS=";" read -ra file_names <<< "$par_gff"
# run agat_sp_merge_annotations
agat_sp_merge_annotations.pl \
$input_files \
-o "$par_output" \
${file_names[@]} \
${par_config:+--config "${par_config}"}

Copy link
Contributor Author

@Leila011 Leila011 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not, the input should be in this format: --gff input1.gff --gff input2.gff

47 changes: 47 additions & 0 deletions src/agat/agat_sp_merge_annotations/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/bin/bash

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"
out_dir="${meta_resources_dir}/out_data"
Leila011 marked this conversation as resolved.
Show resolved Hide resolved

echo "> Run $meta_name with test data 1"
"$meta_executable" \
--gff "$test_dir/file1.gff;$test_dir/file2.gff" \
--output "$out_dir/output.gff"

echo ">> Checking output"
[ ! -f "$out_dir/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$out_dir/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$out_dir/output.gff" "$test_dir/agat_sp_merge_annotations_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

rm -rf "$out_dir/output.gff"

echo "> Run $meta_name with test data 2"
"$meta_executable" \
--gff "$test_dir/fileA.gff;$test_dir/fileB.gff" \
--output "$out_dir/output.gff"

echo ">> Checking output"
[ ! -f "$out_dir/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$out_dir/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$out_dir/output.gff" "$test_dir/agat_sp_merge_annotations_2.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
##gff-version 3
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;ontology=G0222
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;ontology=G0222;merged_ID=IDmodified-mrna-1;merged_Ontology=G0333;merged_Parent=IDmodified-gene-1
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
##gff-version 3
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;merged_ID=B.t1;merged_Parent=B
14 changes: 14 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file1.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;


12 changes: 12 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/file2.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
chr10 BestRefSeq gene 123237824 123357992 . - . ID=gene-FGFR2;Ontology=G0222;
chr10 BestRefSeq mRNA 123237824 123357992 . - . ID=rna-NM_022970.3;Parent=gene-FGFR2;Ontology=G0333;
chr10 BestRefSeq exon 123237824 123239535 . - . ID=exon-NM_022970.3-18;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123243212 123243317 . - . ID=exon-NM_022970.3-17;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123353223 123353481 . - . ID=exon-NM_022970.3-2;Parent=rna-NM_022970.3;
chr10 BestRefSeq exon 123357476 123357992 . - . ID=exon-NM_022970.3-1;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123239371 123239535 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123243212 123243317 . - 1 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq CDS 123353223 123353331 . - 0 ID=cds-NP_075259.4;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123353332 123353481 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq five_prime_UTR 123357476 123357992 . - . ID=agat-five_prime_utr-54403;Parent=rna-NM_022970.3;
chr10 BestRefSeq three_prime_UTR 123237824 123239370 . - . ID=agat-three_prime_utr-54427;Parent=rna-NM_022970.3;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileA.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=A;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=A.t1;Parent=A;
2 changes: 2 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/fileB.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 AUGUSTUS gene 1000424 1039237 . + . ID=B;
chr1 AUGUSTUS mRNA 1000424 1039237 . + . ID=B.t1;Parent=B;
15 changes: 15 additions & 0 deletions src/agat/agat_sp_merge_annotations/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file1.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/file2.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_1.gff src/agat/agat_sp_merge_annotations/test_data

cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileA.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/in/agat_sp_merge_annotations/fileB.gff src/agat/agat_sp_merge_annotations/test_data
cp -r /tmp/agat_source/t/scripts_output/out/agat_sp_merge_annotations_2.gff src/agat/agat_sp_merge_annotations/test_data