Skip to content

Commit

Permalink
Merge main into add-agat_convert_sp_gxf2gxf
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Aug 13, 2024
2 parents dd64681 + 9fc07f6 commit b79a817
Show file tree
Hide file tree
Showing 38 changed files with 2,073 additions and 226 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,16 @@
- `seqtk/seqtk_subseq`: Extract the sequences (complete or subsequence) from the FASTA/FASTQ files
based on a provided sequence IDs or region coordinates file (PR #85).

* `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
* `agat`:
- `agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
- `/agat_convert_bed2gff`: convert bed file to gff format (PR #97).

* `bedtools`:
- `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).


* `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).

* `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).

Expand All @@ -29,6 +38,10 @@

* Update CI to reusable workflow in `viash-io/viash-actions` (PR #86).

* Update several components in order to avoid duplicate code when using `unset` on boolean arguments (PR #133).

* Bump viash to `0.9.0-RC7` (PR #134)

## DOCUMENTATION

* Extend the contributing guidelines (PR #82):
Expand Down
25 changes: 25 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,31 @@ Notes:

* If your tool allows for multiple inputs using a separator other than `;` (which is the default Viash multiple separator), you can substitute these values with a command like: `par_disable_filters=$(echo $par_disable_filters | tr ';' ',')`.

* If you have a lot of boolean variables that you would like to unset when the value is `false`, you can avoid duplicate code by using the following syntax:

```bash
unset_if_false=(
par_argument_1
par_argument_2
par_argument_3
par_argument_4
)
for par in ${unset_if_false[@]}; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
```

this code is equivalent to

```bash
[[ "$par_argument_1" == "false" ]] && unset par_argument_1
[[ "$par_argument_2" == "false" ]] && unset par_argument_2
[[ "$par_argument_3" == "false" ]] && unset par_argument_3
[[ "$par_argument_4" == "false" ]] && unset par_argument_4
```


### Step 12: Create test script

Expand Down
2 changes: 1 addition & 1 deletion _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ links:
issue_tracker: https://github.com/viash-hub/biobox/issues
repository: https://github.com/viash-hub/biobox

viash_version: 0.9.0-RC6
viash_version: 0.9.0-RC7

config_mods: |
.requirements.commands := ['ps']
86 changes: 86 additions & 0 deletions src/agat/agat_convert_bed2gff/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: agat_convert_bed2gff
namespace: agat
description: |
The script takes a bed file as input, and will translate it in gff format. The BED format is described here The script converts 0-based, half-open [start-1, end) bed file to 1-based, closed [start, end] General Feature Format v3 (GFF3).
keywords: [gene annotations, GFF conversion]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_convert_bed2gff.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --bed
description: Input bed file that will be converted.
type: file
required: true
direction: input
example: input.bed
- name: Outputs
arguments:
- name: --output
alternatives: [-o, --out, --outfile, --gff]
description: Output GFF file. If no output file is specified, the output will be written to STDOUT.
type: file
direction: output
required: true
example: output.gff
- name: Arguments
arguments:
- name: --source
description: |
The source informs about the tool used to produce the data and is stored in 2nd field of a gff file. Example: Stringtie, Maker, Augustus, etc. [default: data]
type: string
required: false
example: Stringtie
- name: --primary_tag
description: |
The primary_tag corresponds to the data type and is stored in 3rd field of a gff file. Example: gene, mRNA, CDS, etc. [default: gene]
type: string
required: false
example: gene
- name: --inflate_off
description: |
By default we inflate the block fields (blockCount, blockSizes, blockStarts) to create subfeatures of the main feature (primary_tag). The type of subfeature created is based on the inflate_type parameter. If you do not want this inflating behaviour you can deactivate it by using the --inflate_off option.
type: boolean_false
- name: --inflate_type
description: |
Feature type (3rd column in gff) created when inflate parameter activated [default: exon].
type: string
required: false
example: exon
- name: --verbose
description: add verbosity
type: boolean_true
- name: --config
alternatives: [-c]
description: |
Input agat config file. By default AGAT takes as input agat_config.yaml file from the working directory if any, otherwise it takes the orignal agat_config.yaml shipped with AGAT. To get the agat_config.yaml locally type: "agat config --expose". The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_agat_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
89 changes: 89 additions & 0 deletions src/agat/agat_convert_bed2gff/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
```sh
agat_convert_bed2gff.pl --help
```
------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_convert_bed2gff.pl

Description:
The script takes a bed file as input, and will translate it in gff
format. The BED format is described here:
https://genome.ucsc.edu/FAQ/FAQformat.html#format1 The script converts
0-based, half-open [start-1, end) bed file to 1-based, closed [start,
end] General Feature Format v3 (GFF3).

Usage:
agat_convert_bed2gff.pl --bed infile.bed [ -o outfile ]
agat_convert_bed2gff.pl -h

Options:
--bed Input bed file that will be converted.

--source
The source informs about the tool used to produce the data and
is stored in 2nd field of a gff file. Example:
Stringtie,Maker,Augustus,etc. [default: data]

--primary_tag
The primary_tag corresponds to the data type and is stored in
3rd field of a gff file. Example: gene,mRNA,CDS,etc. [default:
gene]

--inflate_off
By default we inflate the block fields (blockCount, blockSizes,
blockStarts) to create subfeatures of the main feature
(primary_tag). The type of subfeature created is based on the
inflate_type parameter. If you do not want this inflating
behaviour you can deactivate it by using the --inflate_off
option.

--inflate_type
Feature type (3rd column in gff) created when inflate parameter
activated [default: exon].

--verbose
add verbosity

-o , --output , --out , --outfile or --gff
Output GFF file. If no output file is specified, the output will
be written to STDOUT.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

-h or --help
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
19 changes: 19 additions & 0 deletions src/agat/agat_convert_bed2gff/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

## VIASH START
## VIASH END

# unset flags
[[ "$par_inflate_off" == "true" ]] && unset par_inflate_off
[[ "$par_verbose" == "false" ]] && unset par_verbose

# run agat_convert_sp_bed2gff.pl
agat_convert_bed2gff.pl \
--bed "$par_bed" \
-o "$par_output" \
${par_source:+--source "${par_source}"} \
${par_primary_tag:+--primary_tag "${par_primary_tag}"} \
${par_inflate_off:+--inflate_off} \
${par_inflate_type:+--inflate_type "${par_inflate_type}"} \
${par_verbose:+--verbose}
${par_config:+--config "${par_config}"} \
27 changes: 27 additions & 0 deletions src/agat/agat_convert_bed2gff/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"
out_dir="${meta_resources_dir}/out_data"

echo "> Run $meta_name with test data"
"$meta_executable" \
--bed "$test_dir/test.bed" \
--output "$out_dir/output.gff"

echo ">> Checking output"
[ ! -f "$out_dir/output.gff" ] && echo "Output file output.gff does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$out_dir/output.gff" ] && echo "Output file output.gff is empty" && exit 1

echo ">> Check if output matches expected output"
diff "$out_dir/output.gff" "$test_dir/agat_convert_bed2gff_1.gff"
if [ $? -ne 0 ]; then
echo "Output file output.gff does not match expected output"
exit 1
fi

echo "> Test successful"
12 changes: 12 additions & 0 deletions src/agat/agat_convert_bed2gff/test_data/agat_convert_bed2gff_1.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
##gff-version 3
scaffold625 data gene 337818 343277 . + . ID=1;Name=CLUHART00000008717;blockCount=4;blockSizes=154%2C109%2C111%2C1314;blockStarts=0%2C2915%2C3700%2C4146;itemRgb=255%2C0%2C0;thickEnd=343033;thickStart=337914
scaffold625 data exon 337818 337971 . + . ID=exon1;Parent=1
scaffold625 data exon 340733 340841 . + . ID=exon2;Parent=1
scaffold625 data exon 341518 341628 . + . ID=exon3;Parent=1
scaffold625 data exon 341964 343277 . + . ID=exon4;Parent=1
scaffold625 data CDS 337915 337971 . + 0 ID=CDS1;Parent=1
scaffold625 data CDS 340733 340841 . + 0 ID=CDS2;Parent=1
scaffold625 data CDS 341518 341628 . + 2 ID=CDS3;Parent=1
scaffold625 data CDS 341964 343033 . + 2 ID=CDS4;Parent=1
scaffold625 data five_prime_UTR 337818 337914 . + . ID=five_prime_UTR1;Parent=1
scaffold625 data three_prime_UTR 343034 343277 . + . ID=three_prime_UTR1;Parent=1
10 changes: 10 additions & 0 deletions src/agat/agat_convert_bed2gff/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# clone repo
if [ ! -d /tmp/agat_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
fi

# copy test data
cp -r /tmp/agat_source/t/scripts_output/in/test.bed src/agat/agat_convert_bed2gff/test_data/test.bed
cp -r /tmp/agat_source/t/scripts_output/out/agat_convert_bed2gff_1.gff src/agat/agat_convert_bed2gff/test_data/agat_convert_bed2gff_1.gff
1 change: 1 addition & 0 deletions src/agat/agat_convert_bed2gff/test_data/test.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
scaffold625 337817 343277 CLUHART00000008717 0 + 337914 343033 255,0,0 4 154,109,111,1314 0,2915,3700,4146
Loading

0 comments on commit b79a817

Please sign in to comment.