Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agat sp extract attributes #131

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
- `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
- `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
- `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
- `agat/agat_sp_extract_attributes`: extract choosen attributes from a GFF file (PR #131).

* `bedtools`:
- `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
Expand Down
113 changes: 113 additions & 0 deletions src/agat/agat_sp_extract_attributes/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: agat_sp_extract_attributes
namespace: agat
description: |
The script takes a gtf/gff file as input. The script allows to extract
choosen attributes of all or specific feature types. The 9th column of a
gff/gtf file contains a list of attributes. An attribute (gff3) looks
like that `tag=value`.
keywords: [gene annotations, GFF]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_extract_attributes.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
requirements:
- commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff
alternatives: [-f]
description: Input GTF/GFF file.
type: file
required: true
direction: input
example: input.gff
- name: --attribute
alternatives: [--att, -a]
description: |
Attribute tag. The value of the attribute tag specified will be
extracted from the feature type specified by the option `-p`.
direction: input
multiple: true
type: string
required: true
example: protein_id
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out, --outfile]
description: Output GFF file. One file per attribute tag will be created using the attribute tag name as file name suffix.
type: string
required: true
example: output.txt
- name: Arguments
arguments:
- name: --primary_tag
alternatives: [-p, -t, -l]
description: |
Primary tag option, case insensitive, list. Allow to specify the
feature types that will be handled.

You can specify a specific feature by giving its primary tag name (column 3) as:
* cds
* Gene
* mRNA

You can specify directly all the features of a particular
level:

* level2=mRNA,ncRNA,tRNA,etc
* level3=CDS,exon,UTR,etc.

By default, all features are taken into account. Filling the option
with the value "all" will have the same behavior.
type: string
required: false
multiple: true
example: gene
- name: --merge
alternatives: [-m]
description: |
By default, the values of each attribute tag are written in its
dedicated file. To write the values of all tags in only one file,
use this option.
type: boolean_true
- name: --dot
alternatives: [-d]
description: |
By default, when an attribute is not found for a feature, a dot
(.) is reported. If you don't want anything to be printed in such
a case, use this option.
type: boolean_true
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
88 changes: 88 additions & 0 deletions src/agat/agat_sp_extract_attributes/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
```sh
agat_sp_extract_attributes.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_extract_attributes.pl

Description:
The script takes a gtf/gff file as input. The script allows to extract
choosen attributes of all or specific feature types. The 9th column of a
gff/gtf file contains a list of attributes. An attribute (gff3) looks
like that tag=value

Usage:
agat_sp_extract_attributes.pl --gff file.gff --att locus_tag,product,name -p level2,cds,exon [ -o outfile ]
agat_sp_extract_attributes.pl --help

Options:
--gff or -f
Input GTF/GFF file.

-p, -t or -l
primary tag option, case insensitive, list. Allow to specied the
feature types that will be handled. You can specified a specific
feature by given its primary tag name (column 3) as: cds, Gene,
MrNa You can specify directly all the feature of a particular
level: level2=mRNA,ncRNA,tRNA,etc level3=CDS,exon,UTR,etc By
default all feature are taking in account. fill the option by
the value "all" will have the same behaviour.

--attribute, --att, -a
attribute tag. The value of the attribute tag specified will be
extracted from the feature type specified by the option -p. List
of attributes must be coma separated.

--merge or -m
By default the values of each attribute tag is writen in its
dedicated file. To write the values of all tags in only one file
use this option.

-d By default when an attribute is not found for a feature, a dot
(.) is reported. If you don't want anything to be printed in
such case use this option.

-o , --output , --out or --outfile
Output GFF file. If no output file is specified, the output will
be written to STDOUT.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

-h or --help
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
20 changes: 20 additions & 0 deletions src/agat/agat_sp_extract_attributes/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# unset flags
[[ "$par_merge" == "false" ]] && unset par_merge
[[ "$par_dot" == "false" ]] && unset par_dot

# run agat_sp_extract_attributes.pl
agat_sp_extract_attributes.pl \
--gff "$par_gff" \
--attribute "$par_attribute" \
--output "$par_output" \
${par_primary_tag:+-p "${par_primary_tag}"} \
${par_merge:+--merge} \
${par_dot:+-d} \
${par_config:+--config "${par_config}"}
36 changes: 36 additions & 0 deletions src/agat/agat_sp_extract_attributes/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data"
"$meta_executable" \
--gff "$test_dir/1.gff" \
--attribute protein_id \
--output "$TMPDIR/output.txt"

echo ">> Checking output"
[ ! -f "$TMPDIR/output_protein_id.txt" ] && echo "Output file output_protein_id.txt does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output_protein_id.txt" ] && echo "Output file output_protein_id.txt is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$TMPDIR/output_protein_id.txt" "$test_dir/agat_sp_extract_attributes_1.txt"
if [ $? -ne 0 ]; then
echo "Output file output_protein_id.txt does not match expected output"
exit 1
fi

echo "> Test successful"
Loading