Skip to content

Commit

Permalink
Merge branch 'main' into samtools_fastq
Browse files Browse the repository at this point in the history
  • Loading branch information
emmarousseau authored May 22, 2024
2 parents b733478 + 7f59bb8 commit 51e3953
Show file tree
Hide file tree
Showing 17 changed files with 268 additions and 44 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
- `samtools/samtools_stats`: Reports alignment summary statistics for a BAM file (PR #39).
- `samtools/samtools_faidx`: Indexes FASTA files to enable random access to fasta and fastq files (PR #41).
- `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).
- `samtools_collate`: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).

* `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).

Expand All @@ -56,6 +57,10 @@

* Update to Viash 0.8.5 (PR #25).

* Update to Viash 0.9.0-RC3 (PR #51).

## DOCUMENTATION

## BUG FIXES

* Add escaping character before leading hashtag in the description field of the config file (PR #50).
41 changes: 16 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,34 +26,25 @@ We encourage contributions from the community. To contribute:

## Contribution Guidelines

- **Documentation of Functionality**: The purpose and functionality of
each component should be adequately described.
- **Documentation of Inputs and Outputs**: All input and output
arguments should have a description and example (with extension).
- **Docker Image**: A Docker image (with optional additional
dependencies) should be provided.
- **Write unit tests**: A unit test with possibly test resources needs
to be provided.
- **Provide test resources**: If the unit test requires test resources,
these should be provided in the `test_resources` section of the
component.
- **Versioning**: If the component uses custom software (not installed
via Apt, Apk, Yum, Pip, Conda, or R), a Bash script `version.sh` needs
to be provided that outputs the version of the software.
- **File format specifications**: If a component returns a directory or
data structure such as AnnData or MuData, a specification of the file
format should be provided.
The contribution guidelines describes which steps you should follow to
contribute a component to this repository.

1. Find a component to contribute
2. Add config template
3. Fill in the metadata
4. Find a suitable container
5. Create help file
6. Create or fetch test data
7. Add arguments for the input files
8. Add arguments for the output files
9. Add arguments for the other arguments
10. Add a Docker engine
11. Write a runner script
12. Create test script
13. Create a `/var/software_versions.txt` file

See the [CONTRIBUTING](CONTRIBUTING.md) file for more details.

## Repository Structure


## Installation and Usage


## Support and Community

For support, questions, or to join our community:
Expand Down
17 changes: 6 additions & 11 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,24 @@ We encourage contributions from the community. To contribute:

## Contribution Guidelines

The contribution guidelines describes which steps you should follow to contribute a component to this repository.

```{r echo=FALSE}
lines <- readr::read_lines("CONTRIBUTING.md")
index_start <- grep("^## ", lines)
index_start <- grep("^### Step [0-9]*:", lines)
index_end <- c(index_start[-1] - 1, length(lines))
name <- gsub("^## ", "", lines[index_start])
description <- lines[index_start + 2]
name <- gsub("^### Step [0-9]*: *", "", lines[index_start])
knitr::asis_output(
paste(paste0(" * **", name, "**: ", description, "\n"), collapse = "")
paste(paste0(" 1. ", name, "\n"), collapse = "")
)
```

See the [CONTRIBUTING](CONTRIBUTING.md) file for more details.

## Repository Structure

...

## Installation and Usage

...

## Support and Community

Expand Down
5 changes: 4 additions & 1 deletion _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,7 @@ links:
issue_tracker: https://github.com/viash-hub/biobase/issues
repository: https://github.com/viash-hub/biobase

viash_version: 0.9.0-RC2
viash_version: 0.9.0-RC3

config_mods: |
.requirements.commands := ['ps']
10 changes: 5 additions & 5 deletions src/bcl_convert/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,20 +50,20 @@ argument_groups:
example: true
- name: --bcl_num_parallel_tiles
type: integer
description: "# of tiles to process in parallel (default 1)"
description: "\\# of tiles to process in parallel (default 1)"
example: 1
- name: --bcl_num_conversion_threads
type: integer
description: "# of threads for conversion (per tile, default # cpu threads)"
description: "\\# of threads for conversion (per tile, default # cpu threads)"
example: 1
- name: --bcl_num_compression_threads
type: integer
description: "# of threads for fastq.gz output compression (per tile, default # cpu threads, or HW+12)"
description: "\\# of threads for fastq.gz output compression (per tile, default # cpu threads, or HW+12)"
example: 1
- name: --bcl_num_decompression_threads
type: integer
description:
"# of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8).
"\\# of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8).
Only applies when preloading files"
example: 1

Expand All @@ -79,7 +79,7 @@ argument_groups:
example: true
- name: --num_unknown_barcodes_reported
type: integer
description: "# of Top Unknown Barcodes to output (1000 by default)"
description: "\\# of Top Unknown Barcodes to output (1000 by default)"
example: 1000
- name: --bcl_validate_sample_sheet_only
type: boolean
Expand Down
2 changes: 1 addition & 1 deletion src/falco/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ engines:
image: debian:trixie-slim
setup:
- type: apt
packages: [wget, build-essential, g++, zlib1g-dev]
packages: [wget, build-essential, g++, zlib1g-dev, procps]
- type: docker
run: |
wget https://github.com/smithlabcode/falco/releases/download/v1.2.2/falco-1.2.2.tar.gz -O /tmp/falco.tar.gz && \
Expand Down
6 changes: 6 additions & 0 deletions src/multiqc/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,12 @@ argument_groups:
type: boolean_true
description: |
Disable coloured log output.
- name: "--cl_config"
type: string
required: false
description: |
YAML formatted string that allows to customize MultiQC behaviour like input file detection.
example: "qualimap_config: { general_stats_coverage: [20,40,200] }"

- name: "Output format"
arguments:
Expand Down
3 changes: 2 additions & 1 deletion src/multiqc/script.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

# disable flags
[[ "$par_ignore_symlinks" == "false" ]] && unset ignore_symlinks
[[ "$par_ignore_symlinks" == "false" ]] && unset par_ignore_symlinks
[[ "$par_dirs" == "false" ]] && unset par_dirs
[[ "$par_full_names" == "false" ]] && unset par_full_names
[[ "$par_fn_as_s_name" == "false" ]] && unset par_fn_as_s_name
Expand Down Expand Up @@ -99,6 +99,7 @@ multiqc \
${include_modules} \
${par_include_modules:+--include-modules "$par_include_modules"} \
${par_data_format:+--data-format "$par_data_format"} \
${par_cl_config:+--cl-config "$par_cl_config"} \
${par_zip_data_dir:+--zip-data-dir} \
${par_pdf:+--pdf} \
${par_interactive:+--interactive} \
Expand Down
94 changes: 94 additions & 0 deletions src/samtools/samtools_collate/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
name: samtools_collate
namespace: samtools
description: Shuffles and groups reads in SAM/BAM/CRAM files together by their names.
keywords: [collate, counts, bam, sam, cram]
links:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/samtools-icollate.html
repository: https://github.com/samtools/samtools
references:
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
license: MIT/Expat

argument_groups:
- name: Inputs
arguments:
- name: --input
type: file
description: The input BAM file.
required: true
- name: --reference
type: file
description: Reference sequence FASTA FILE.

- name: Outputs
arguments:
- name: --output
alternatives: -o
type: file
description: The output filename.
required: true
direction: output

- name: Options
arguments:
- name: --uncompressed
alternatives: -u
type: boolean_true
description: Output uncompressed BAM.
- name: --fast
alternatives: -f
type: boolean_true
description: Fast mode, only primary alignments.
- name: --working_reads
alternatives: -r
type: integer
description: Working reads stored (for use with -f).
default: 10000
- name: --compression
alternatives: -l
type: integer
description: Compression level.
default: 1
- name: --nb_tmp_files
alternatives: -n
type: integer
description: Number of temporary files.
default: 64
- name: --tmp_prefix
alternatives: -T
type: string
description: Write temporary files to PREFIX.nnnn.bam.
- name: --no_pg
type: boolean_true
description: Do not add a PG line.
- name: --input_fmt_option
type: string
description: Specify a single input file format option in the form of OPTION or OPTION=VALUE.
- name: --output_fmt
type: string
description: Specify output format (SAM, BAM, CRAM).
- name: --output_fmt_option
type: string
description: Specify a single output file format option in the form of OPTION or OPTION=VALUE.


resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
setup:
- type: docker
run: |
samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | \
sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
31 changes: 31 additions & 0 deletions src/samtools/samtools_collate/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
```
samtools collate
```
Usage: samtools collate [options...] <in.bam> [<prefix>]

Options:
-O Output to stdout
-o Output file name (use prefix if not set)
-u Uncompressed BAM output
-f Fast (only primary alignments)
-r Working reads stored (with -f) [10000]
-l INT Compression level [1]
-n INT Number of temporary files [64]
-T PREFIX
Write temporary files to PREFIX.nnnn.bam
--no-PG do not add a PG line
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--verbosity INT
Set level of verbosity
<prefix> is required unless the -o or -O options are used.
27 changes: 27 additions & 0 deletions src/samtools/samtools_collate/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

## VIASH START
## VIASH END

set -e

[[ "$par_uncompressed" == "false" ]] && unset par_uncompressed
[[ "$par_fast" == "false" ]] && unset par_fast
[[ "$par_no_pg" == "false" ]] && unset par_no_pg

samtools collate \
"$par_input" \
${par_output:+-o "$par_output"} \
${par_reference:+-T "$par_reference"} \
${par_uncompressed:+-u} \
${par_fast:+-f} \
${par_working_reads:+-r "$par_working_reads"} \
${par_compression:+-l "$par_compression"} \
${par_nb_tmp_files:+-n "$par_nb_tmp_files"} \
${par_tmp_prefix:+-T "$par_tmp_prefix"} \
${par_no_pg:+-P} \
${par_input_fmt_option:+-O "$par_input_fmt_option"} \
${par_output_fmt:+-O "$par_output_fmt"} \
${par_output_fmt_option:+-O "$par_output_fmt_option"}

exit 0
Loading

0 comments on commit 51e3953

Please sign in to comment.