-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add agat convert sp gff2bed #114
Open
Leila011
wants to merge
13
commits into
main
Choose a base branch
from
add-agat_convert_sp_gff2bed
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
f52d380
add help
Leila011 10099be
add config
Leila011 8e7b1c8
add run script
Leila011 a945a4f
add test data & expected output + script to fetch them
Leila011 59cd032
fix config
Leila011 83b261b
add test
Leila011 bba246c
update changelog
Leila011 30d8899
Merge main into add-agat_convert_sp_gff2bed
rcannood 4266561
Merge branch 'main' into add-agat_convert_sp_gff2bed
Leila011 666da74
update config: format description, add requirements, add keywords, up…
Leila011 fbc9074
Merge branch 'add-agat_convert_sp_gff2bed' of https://github.com/vias…
Leila011 54bb076
add set -eo pipefail to test adn script files
Leila011 25ff307
create temporary directory and clean up on exit
Leila011 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,105 @@ | ||||||
name: agat_convert_sp_gff2bed | ||||||
namespace: agat | ||||||
description: | | ||||||
The script aims to convert GTF/GXF file into bed file. It will convert | ||||||
level2 features from gff (mRNA, transcripts) into bed features. If the | ||||||
selected level2 subfeatures (defaut: exon) exist, they are reported in | ||||||
the block fields (9-12th colum in bed). CDS Start and End are reported | ||||||
in column 7 and 8 accordingly. | ||||||
|
||||||
### Definition of the bed format: | ||||||
|
||||||
#### Definition of the BED format: | ||||||
|
||||||
1. **chrom** - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671). | ||||||
2. **chromStart** - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0. | ||||||
3. **chromEnd** - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99. | ||||||
|
||||||
#### OPTIONAL fields: | ||||||
|
||||||
4. **name** - Defines the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window when the track is open to full display mode or directly to the left of the item in pack mode. | ||||||
5. **score** - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). | ||||||
6. **strand** - Defines the strand - either '+' or '-'. | ||||||
7. **thickStart** - The starting position at which the feature is drawn thickly. | ||||||
8. **thickEnd** - The ending position at which the feature is drawn thickly. | ||||||
9. **itemRgb** - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to "On", this RGB value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browser. | ||||||
10. **blockCount** - The number of blocks (exons) in the BED line. | ||||||
11. **blockSizes** - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. | ||||||
12. **blockStarts** - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount. | ||||||
keywords: [gene annotations, GTF conversion, BED] | ||||||
links: | ||||||
homepage: https://github.com/NBISweden/AGAT | ||||||
documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_sp_gff2bed.html | ||||||
issue_tracker: https://github.com/NBISweden/AGAT/issues | ||||||
repository: https://github.com/NBISweden/AGAT | ||||||
references: | ||||||
doi: 10.5281/zenodo.3552717 | ||||||
license: GPL-3.0 | ||||||
requirements: | ||||||
- commands: [agat] | ||||||
authors: | ||||||
- __merge__: /src/_authors/leila_paquay.yaml | ||||||
roles: [ author, maintainer ] | ||||||
argument_groups: | ||||||
- name: Inputs | ||||||
arguments: | ||||||
- name: --gff | ||||||
alternatives: [-i] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
description: Input GFF3 file that will be read. | ||||||
type: file | ||||||
required: true | ||||||
direction: input | ||||||
example: input.gff | ||||||
- name: Outputs | ||||||
arguments: | ||||||
- name: --output | ||||||
alternatives: [--outfile, --out, -o] | ||||||
description: | | ||||||
File where the result will be written. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
type: file | ||||||
direction: output | ||||||
required: true | ||||||
example: output.bed | ||||||
- name: Arguments | ||||||
arguments: | ||||||
- name: --nc | ||||||
description: | | ||||||
Behaviour for non-coding features (e.g. records without CDS): | ||||||
|
||||||
* keep: Default, they are kept but no CDS position is reported in the 7th and 8th columns (a period is reported instead). | ||||||
* filter: We remove them. | ||||||
* transcript: We keep them but values in the 7th and 8th columns will contain transcript's start and stop. | ||||||
type: string | ||||||
choices: [keep, filter, transcript] | ||||||
required: false | ||||||
- name: --sub | ||||||
description: | | ||||||
Define the subfeature (level3, e.g. exon, cds, utr, etc.) to report as blocks in the BED output. Default: exon. | ||||||
type: string | ||||||
required: false | ||||||
example: exon | ||||||
- name: --config | ||||||
alternatives: [-c] | ||||||
description: | | ||||||
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently). | ||||||
type: file | ||||||
required: false | ||||||
example: custom_agat_config.yaml | ||||||
resources: | ||||||
- type: bash_script | ||||||
path: script.sh | ||||||
test_resources: | ||||||
- type: bash_script | ||||||
path: test.sh | ||||||
- type: file | ||||||
path: test_data | ||||||
engines: | ||||||
- type: docker | ||||||
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0 | ||||||
setup: | ||||||
- type: docker | ||||||
run: | | ||||||
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt | ||||||
runners: | ||||||
- type: executable | ||||||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
```sh | ||
agat_convert_sp_gff2bed.pl --help | ||
``` | ||
|
||
------------------------------------------------------------------------------ | ||
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 | | ||
| https://github.com/NBISweden/AGAT | | ||
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se | | ||
------------------------------------------------------------------------------ | ||
|
||
Name: | ||
agat_convert_sp_gff2bed.pl | ||
|
||
Description: | ||
The script aims to convert GTF/GXF file into bed file. It will convert | ||
level2 features from gff (mRNA, transcripts) into bed features. If the | ||
selected level2 subfeatures (defaut: exon) exist, they are reported in | ||
the block fields (9-12th colum in bed). CDS Start and End are reported | ||
in column 7 and 8 accordingly. | ||
|
||
Definintion of the bed format: # 1 chrom - The name of the chromosome | ||
(e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671). # 2 | ||
chromStart - The starting position of the feature in the chromosome or | ||
scaffold. The first base in a chromosome is numbered 0. # 3 chromEnd - | ||
The ending position of the feature in the chromosome or scaffold. The | ||
chromEnd base is not included in the display of the feature. For | ||
example, the first 100 bases of a chromosome are defined as | ||
chromStart=0, chromEnd=100, and span the bases numbered 0-99. ########## | ||
OPTIONAL fields ########## # 4 name - Defines the name of the BED line. | ||
This label is displayed to the left of the BED line in the Genome | ||
Browser window when the track is open to full display mode or directly | ||
to the left of the item in pack mode. # 5 score - A score between 0 and | ||
1000. If the track line useScore attribute is set to 1 for this | ||
annotation data set, the score value will determine the level of gray in | ||
which this feature is displayed (higher numbers = darker gray). # 6 | ||
strand - Defines the strand - either '+' or '-'. # 7 thickStart - The | ||
starting position at which the feature is drawn thickly # 8 thickEnd - | ||
The ending position at which the feature is drawn thickly # 9 itemRgb - | ||
An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb | ||
attribute is set to "On", this RBG value will determine the display | ||
color of the data contained in this BED line. NOTE: It is recommended | ||
that a simple color scheme (eight colors or less) be used with this | ||
attribute to avoid overwhelming the color resources of the Genome | ||
Browser and your Internet browser. # 10 blockCount - The number of | ||
blocks (exons) in the BED line. # 11 blockSizes - A comma-separated list | ||
of the block sizes. The number of items in this list should correspond | ||
to blockCount. # 12 blockStarts - A comma-separated list of block | ||
starts. All of the blockStart positions should be calculated relative to | ||
chromStart. The number of items in this list should correspond to | ||
blockCount. | ||
|
||
Usage: | ||
agat_convert_sp_gff2bed.pl --gff file.gff [ -o outfile ] | ||
agat_convert_sp_gff2bed.pl --help | ||
|
||
Options: | ||
--gff Input GFF3 file that will be read | ||
|
||
--nc STRING - behaviour for non-coding features (e.g. recored wihtout | ||
CDS). [keep,filter,transcript] keep - Default, they are kept but | ||
no CDS position is reported in the 7th and 8th columns (a period | ||
is reported instead). filter - We remove them. transcript - We | ||
keep them but values in 7th and 8th columns will contains | ||
transcript's start and stop. | ||
|
||
--sub Define the subfeature (level3, e.g exon,cds,utr,etc...) to | ||
report as blocks in the bed output. Defaut: exon. | ||
|
||
--outfile, --out, --output, or -o | ||
File where will be written the result. If no output file is | ||
specified, the output will be written to STDOUT. | ||
|
||
-c or --config | ||
String - Input agat config file. By default AGAT takes as input | ||
agat_config.yaml file from the working directory if any, | ||
otherwise it takes the orignal agat_config.yaml shipped with | ||
AGAT. To get the agat_config.yaml locally type: "agat config | ||
--expose". The --config option gives you the possibility to use | ||
your own AGAT config file (located elsewhere or named | ||
differently). | ||
|
||
-h or --help | ||
Display this helpful text. | ||
|
||
Feedback: | ||
Did you find a bug?: | ||
Do not hesitate to report bugs to help us keep track of the bugs and | ||
their resolution. Please use the GitHub issue tracking system available | ||
at this address: | ||
|
||
https://github.com/NBISweden/AGAT/issues | ||
|
||
Ensure that the bug was not already reported by searching under Issues. | ||
If you're unable to find an (open) issue addressing the problem, open a new one. | ||
Try as much as possible to include in the issue when relevant: | ||
- a clear description, | ||
- as much relevant information as possible, | ||
- the command used, | ||
- a data sample, | ||
- an explanation of the expected behaviour that is not occurring. | ||
|
||
Do you want to contribute?: | ||
You are very welcome, visit this address for the Contributing | ||
guidelines: | ||
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
set -eo pipefail | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
agat_convert_sp_gff2bed.pl \ | ||
--gff "$par_gff" \ | ||
--output "$par_output" \ | ||
${par_nc:+--nc "${par_nc}"} \ | ||
${par_sub:+--sub "${par_sub}"} \ | ||
${par_config:+--config "${par_config}"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#!/bin/bash | ||
|
||
set -eo pipefail | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
test_dir="${meta_resources_dir}/test_data" | ||
|
||
# create temporary directory and clean up on exit | ||
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX") | ||
function clean_up { | ||
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR" | ||
} | ||
trap clean_up EXIT | ||
|
||
echo "> Run $meta_name with test data" | ||
"$meta_executable" \ | ||
--gff "$test_dir/1.gff" \ | ||
--output "$TMPDIR/output.gff" | ||
|
||
echo ">> Checking output" | ||
[ ! -f "$TMPDIR/output.gff" ] && echo "Output file output.gff does not exist" && exit 1 | ||
|
||
echo ">> Check if output is empty" | ||
[ ! -s "$TMPDIR/output.gff" ] && echo "Output file output.gff is empty" && exit 1 | ||
|
||
echo ">> Check if output matches expected output" | ||
diff "$TMPDIR/output.gff" "$test_dir/agat_convert_sp_gff2bed_1.gff" | ||
if [ $? -ne 0 ]; then | ||
echo "Output file output.gff does not match expected output" | ||
exit 1 | ||
fi | ||
|
||
echo "> Test successful" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.