Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into add_lofreq
Browse files Browse the repository at this point in the history
  • Loading branch information
KaiWaldrant committed Feb 12, 2024
2 parents 80b6a9d + ee70c2c commit 611cb22
Show file tree
Hide file tree
Showing 19 changed files with 186 additions and 33 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@

* `fastp`: An ultra-fast all-in-one FASTQ preprocessor (PR #3).

* `busco`: Assess genome assembly and annotation completeness with single copy orthologs (PR #6).
* `busco`:
- `busco/busco_run`: Assess genome assembly and annotation completeness with single copy orthologs (PR #6).
- `busco/busco_list_datasets`: Lists available busco datasets (PR #18).
- `busco/busco_download_datasets`: Download busco datasets (PR #19).

* `featurecounts`: Assign sequence reads to genomic features (PR #11).

Expand All @@ -24,6 +27,8 @@

## MINOR CHANGES

* Uniformize component metadata (PR #23).

## DOCUMENTATION

## BUG FIXES
12 changes: 7 additions & 5 deletions src/arriba/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@ functionality:
description: Detect gene fusions from RNA-Seq data
info:
keywords: [Gene fusion, RNA-Seq]
homepage: https://arriba.readthedocs.io/en/latest/
documentation: https://arriba.readthedocs.io/en/latest/
repository: https://github.com/suhrig/arriba
reference: "doi:10.1101/gr.257246.119"
licence: MIT
links:
homepage: https://arriba.readthedocs.io/en/latest/
documentation: https://arriba.readthedocs.io/en/latest/
repository: https://github.com/suhrig/arriba
references:
doi: 10.1101/gr.257246.119
license: MIT
requirements:
cpus: 1
commands: [ arriba ]
Expand Down
11 changes: 6 additions & 5 deletions src/bgzip/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@ functionality:
name: bgzip
description: Block compression/decompression utility
info:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/bgzip.html
repository: https://github.com/samtools/htslib
licence: MIT
reference:
links:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/bgzip.html
repository: https://github.com/samtools/htslib
references:
doi: 10.1093/gigascience/giab007
license: MIT
requirements:
commands: [ bgzip ]
argument_groups:
Expand Down
46 changes: 46 additions & 0 deletions src/busco/busco_download_datasets/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
functionality:
name: busco_download_datasets
namespace: busco
description: Downloads available busco datasets
info:
links:
homepage: https://busco.ezlab.org/
documentation: https://busco.ezlab.org/busco_userguide.html
repository: https://gitlab.com/ezlab/busco
references:
doi: 10.1007/978-1-4939-9173-0_14
license: MIT
argument_groups:
- name: Inputs
arguments:
- name: --download
type: string
description: |
Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus".
The full list of available datasets can be viewed [here](https://busco-data.ezlab.org/v5/data/lineages/) or by running the busco/busco_list_datasets component.
required: true
example: stramenopiles_odb10
- name: Outputs
arguments:
- name: --download_path
direction: output
type: file
description: |
Local filepath for storing BUSCO dataset downloads
required: false
default: busco_downloads
example: busco_downloads
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
platforms:
- type: docker
image: quay.io/biocontainers/busco:5.6.1--pyhdfd78af_0
setup:
- type: docker
run: |
busco --version | sed 's/BUSCO\s\(.*\)/busco: "\1"/' > /var/software_versions.txt
- type: nextflow
14 changes: 14 additions & 0 deletions src/busco/busco_download_datasets/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

## VIASH START
## VIASH END


if [ ! -d "$par_download_path" ]; then
mkdir -p "$par_download_path"
fi

busco \
--download_path "$par_download_path" \
--download "$par_download"

15 changes: 15 additions & 0 deletions src/busco/busco_download_datasets/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
echo "> Downloading busco stramenopiles_odb10 dataset"

"$meta_executable" \
--download stramenopiles_odb10 \
--download_path downloads

echo ">> Checking output"
[ ! -f "downloads/file_versions.tsv" ] && echo "file_versions.tsv does not exist" && exit 1
[ ! -f "downloads/lineages/stramenopiles_odb10/dataset.cfg" ] && echo "dataset.cfg does not exist" && exit 1

echo ">> Checking if output is empty"
[ ! -s "downloads/file_versions.tsv" ] && echo "file_versions.tsv is empty" && exit 1
[ ! -s "downloads/lineages/stramenopiles_odb10/dataset.cfg" ] && echo "dataset.cfg is empty" && exit 1

rm -r downloads
38 changes: 38 additions & 0 deletions src/busco/busco_list_datasets/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
functionality:
name: busco_list_datasets
namespace: busco
description: Lists the available busco datasets
info:
links:
homepage: https://busco.ezlab.org/
documentation: https://busco.ezlab.org/busco_userguide.html
repository: https://gitlab.com/ezlab/busco
references:
doi: 10.1007/978-1-4939-9173-0_14
license: MIT
argument_groups:
- name: Outputs
arguments:
- name: --output
alternatives: ["-o"]
direction: output
type: file
description: |
Output file of the available busco datasets
required: false
default: busco_dataset_list.txt
example: file.txt
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
platforms:
- type: docker
image: quay.io/biocontainers/busco:5.6.1--pyhdfd78af_0
setup:
- type: docker
run: |
busco --version | sed 's/BUSCO\s\(.*\)/busco: "\1"/' > /var/software_versions.txt
- type: nextflow
6 changes: 6 additions & 0 deletions src/busco/busco_list_datasets/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

## VIASH START
## VIASH END

busco --list-datasets | awk '/^#{40}/{flag=1; next} flag{print}' > $par_output
15 changes: 15 additions & 0 deletions src/busco/busco_list_datasets/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

## VIASH START
## VIASH END

"$meta_executable" \
--output datasets.txt

echo ">> Checking output"
[ ! -f "datasets.txt" ] && echo "datasets.txt does not exist" && exit 1

echo ">> Checking if output is empty"
[ ! -s "datasets.txt" ] && echo "datasets.txt is empty" && exit 1

rm datasets.txt
21 changes: 13 additions & 8 deletions src/busco/config.vsh.yaml → src/busco/busco_run/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
functionality:
name: busco
name: busco_run
namespace: busco
description: Assessment of genome assembly and annotation completeness with single copy orthologs
info:
keywords: [Genome assembly, quality control]
homepage: https://busco.ezlab.org/
documentation: https://busco.ezlab.org/busco_userguide.html
repository: https://gitlab.com/ezlab/busco
reference: "10.1007/978-1-4939-9173-0_14"
licence: MIT
links:
homepage: https://busco.ezlab.org/
documentation: https://busco.ezlab.org/busco_userguide.html
repository: https://gitlab.com/ezlab/busco
references:
doi: 10.1007/978-1-4939-9173-0_14
license: MIT
argument_groups:
- name: Inputs
arguments:
Expand Down Expand Up @@ -35,9 +38,11 @@ functionality:
required: false
description: |
Specify a BUSCO lineage dataset that is most closely related to the assembly or gene set being assessed.
The full list of available datasets can be viewed [here](https://busco-data.ezlab.org/v5/data/lineages/) or by running `busco --list-datasets` (which requires installing the tool).
The full list of available datasets can be viewed [here](https://busco-data.ezlab.org/v5/data/lineages/) or by running the busco/busco_list_datasets component.
When unsure, the "--auto_lineage" flag can be set to automatically find the optimal lineage path.
Requested datasets will automatically be downloaded if not already present in the download folder.
BUSCO will automatically download the requested dataset if it is not already present in the download folder.
You can optionally provide a path to a local dataset instead of a name, e.g. path/to/dataset.
Datasets can be downloaded using the busco/busco_download_dataset component.
example: stramenopiles_odb10

- name: Outputs
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
10 changes: 6 additions & 4 deletions src/fastp/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,12 @@ functionality:
- support ultra-fast FASTQ-level deduplication
info:
keywords: [RNA-Seq, Trimming, Quality control]
repository: https://github.com/OpenGene/fastp
documentation: https://github.com/OpenGene/fastp/blob/master/README.md
reference: "doi:10.1093/bioinformatics/bty560"
licence: MIT
links:
repository: https://github.com/OpenGene/fastp
documentation: https://github.com/OpenGene/fastp/blob/master/README.md
references:
doi: 10.1093/bioinformatics/bty560
license: MIT
argument_groups:
- name: Inputs
description: |
Expand Down
12 changes: 7 additions & 5 deletions src/featurecounts/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ functionality:
featureCounts is a read summarization program for counting reads generated from either RNA or genomic DNA sequencing experiments by implementing highly efficient chromosome hashing and feature blocking techniques. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications.
info:
keywords: ["Read counting", "Genomic features"]
homepage: https://subread.sourceforge.net/
documentation: https://subread.sourceforge.net/SubreadUsersGuide.pdf
repository: https://github.com/ShiLab-Bioinformatics/subread
reference: "doi:10.1093/bioinformatics/btt656"
licence: GPL-3.0
links:
homepage: https://subread.sourceforge.net/
documentation: https://subread.sourceforge.net/SubreadUsersGuide.pdf
repository: https://github.com/ShiLab-Bioinformatics/subread
references:
doi: 10.1093/bioinformatics/btt656
license: GPL-3.0
requirements:
commands: [ featureCounts ]

Expand Down
12 changes: 7 additions & 5 deletions src/pear/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ functionality:
PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer.
info:
keywords: [ "pair-end", "read", "merge" ]
homepage: https://cme.h-its.org/exelixis/web/software/pear
repository: https://github.com/tseemann/PEAR
documentation: https://cme.h-its.org/exelixis/web/software/pear/doc.html
reference: doi:10.1093/bioinformatics/btt593
licence: "CC-BY-NC-SA-3.0"
links:
homepage: https://cme.h-its.org/exelixis/web/software/pear
repository: https://github.com/tseemann/PEAR
documentation: https://cme.h-its.org/exelixis/web/software/pear/doc.html
references:
doi: 10.1093/bioinformatics/btt593
license: "CC-BY-NC-SA-3.0"
requirements:
commands: [ pear , gzip ]
argument_groups:
Expand Down

0 comments on commit 611cb22

Please sign in to comment.