From 087526bb634ae8eee8d6fd248405acd39d44be02 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Fri, 2 Feb 2024 12:36:46 +0100 Subject: [PATCH] extend contributing guidelines --- CONTRIBUTING.md | 349 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 286 insertions(+), 63 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 193b3779..da6f72bb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -7,6 +7,292 @@ We encourage contributions from the community. To contribute: 2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below). 3. **Submit a Pull Request**: After testing your component, submit a pull request for review. +## Procedure of adding a component + +### Step 1: Find a component to contribute + +* Find a tool to contribute to this repo. + +* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1). + +* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration. + +* Create an issue to show that you are working on this component. + + +### Step 2: Add config template + +Change all occurrences of `xxx` to the name of the component. + +Create a file at `src/xxx/config.vsh.yaml` with contents: + +```yaml +functionality: + name: xxx + description: xxx + keywords: [tag1, tag2] + links: + homepage: yyy + documentation: yyy + repository: yyy + references: + doi: 12345/12345678.yz + license: MIT/Apache-2.0/GPL-3.0/... + argument_groups: + - name: Inputs + arguments: <...> + - name: Outputs + arguments: <...> + - name: Arguments + arguments: <...> + resources: + - type: bash_script + path: script.sh + test_resources: + - type: bash_script + path: test.sh + - type: file + path: test_data +engines: + - <...> +runners: + - type: executable + - type: nextflow +``` + +### Step 3: Fill in the metadata + +Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component. + +```yaml +functionality: + name: arriba + description: Detect gene fusions from RNA-Seq data + keywords: [Gene fusion, RNA-Seq] + links: + homepage: https://arriba.readthedocs.io/en/latest/ + documentation: https://arriba.readthedocs.io/en/latest/ + repository: https://github.com/suhrig/arriba + references: + doi: 10.1101/gr.257246.119 + bibtex: | + @article{ + ... a bibtex entry in case the doi is not available ... + } + license: MIT +``` + +### Step 4: Find a suitable container + +Google `biocontainer ` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`. + +If no such container is found, you can create a custom container in the next step. + + +### Step 5: Create help file + +To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`. + +````bash +cat < src/xxx/help.txt +```sh +xxx --help +``` +EOF + +docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt +```` + +Notes: + +* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool. + +* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`: + + ```bash + docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h + ``` + +### Step 6: Add arguments for the input files + +By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component. + +For instance, in the [arriba help file](src/arriba/help.txt), we see the following: + + Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \ + -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \ + [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \ + -o fusions.tsv [-O fusions.discarded.tsv] \ + [OPTIONS] + + -x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR + (Aligned.out.sam). Arriba extracts candidate reads from this file. + +Based on this information, we can add the following input arguments to the config file. + +```yaml +argument_groups: + - name: Inputs + arguments: + - name: --bam + alternatives: -x + type: file + description: | + File in SAM/BAM/CRAM format with main alignments as generated by STAR + (Aligned.out.sam). Arriba extracts candidate reads from this file. + required: true + example: Aligned.out.bam +``` + +Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments. + +Several notes: + +* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`. + +* Input arguments can have `multiple: true` to allow the user to specify multiple files. + + + +### Step 7: Add arguments for the output files + +By looking at the help file, we now also add output arguments to the config file. + +For example, in the [arriba help file](src/arriba/help.txt), we see the following: + + + Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \ + -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \ + [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \ + -o fusions.tsv [-O fusions.discarded.tsv] \ + [OPTIONS] + + -o FILE Output file with fusions that have passed all filters. + + -O FILE Output file with fusions that were discarded due to filtering. + +Based on this information, we can add the following output arguments to the config file. + +```yaml +argument_groups: + - name: Outputs + arguments: + - name: --fusions + alternatives: -o + type: file + direction: output + description: | + Output file with fusions that have passed all filters. + required: true + example: fusions.tsv + - name: --fusions_discarded + alternatives: -O + type: file + direction: output + description: | + Output file with fusions that were discarded due to filtering. + required: false + example: fusions.discarded.tsv +``` + +Note: + +* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory). + +### Step 8: Add arguments for the other arguments + +Finally, add all other arguments to the config file. There are a few exceptions: + +* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file. + +* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file. + + +### Step 9: Add a Docker engine + +To ensure reproducibility of components, we require that all components are run in a Docker container. + +```yaml +engines: + - type: docker + image: quay.io/biocontainers/xxx:0.1.0--py_0 +``` + +If you didn't find a suitable container in the previous step, you can create a custom container. For example: + +```yaml +engines: + - type: docker + image: python:3.10 + setup: + - type: python + packages: numpy +``` + +For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform). + +Here is a list of base containers we can recommend: + +* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu) +* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script) +* JavaScript: [`node`](https://hub.docker.com/_/node) +* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) +* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse) +* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt) + +### Step 10: Write a runner script + +Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments. + +```bash +#!/bin/bash + +## VIASH START +## VIASH END + +xxx \ + --input "$par_input" \ + --output "$par_output" \ + $([ "$par_option" = "true" ] && echo "--option") +``` + +When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config. + +As an example, this is what the Bash script for the `arriba` component looks like: + +```bash +#!/bin/bash + +## VIASH START +## VIASH END + +arriba \ + -x "$par_bam" \ + -a "$par_genome" \ + -g "$par_gene_annotation" \ + -o "$par_fusions" \ + ${par_known_fusions:+-k "${par_known_fusions}"} \ + ${par_blacklist:+-b "${par_blacklist}"} \ + ${par_structural_variants:+-d "${par_structural_variants}"} \ + $([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \ + $([ "$par_extra_information" = "true" ] && echo "-X") \ + $([ "$par_fill_gaps" = "true" ] && echo "-I") +``` + +### Step 11: Add a test script + +### Step 12: Create a `/var/software_versions.txt` file + +```yaml +engines: + - type: docker + image: quay.io/biocontainers/xxx:0.1.0--py_0 + setup: + - type: docker + run: | + echo "xxx: \"0.1.0\"" > /var/software_versions.txt +``` + ## Documentation of Functionality The purpose and functionality of each component should be adequately described. @@ -171,66 +457,3 @@ functionality: description: "Which normalization was used" required: true ``` - -## Workflow - -### Step 1: Find a component to contribute - -* Find a tool to contribute to this repo -* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1) -* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration -* Add the component to the Project board to show that you are working on it - -### Step 2: Add config template - -Change all occurrences of `xxx` to the name of the component. - -Contents of `src/xxx/config.vsh.yaml`: - -```yaml -functionality: - name: xxx - description: xxx - info: - keywords: [tag1, tag2] - homepage: yyy - documentation: yyy - repository: yyy - reference: "doi:yyy" - licence: yyy - argument_groups: - - name: Inputs - arguments: - - name: Outputs - arguments: - - name: Arguments - arguments: - resources: - - type: bash_script - path: script.sh - test_resources: - - type: bash_script - path: test.sh - - type: file - path: test_data -platforms: - - type: docker - image: quay.io/biocontainers/xxx:0.1.0--py_0 - setup: - - type: docker - run: | - echo "xxx: \"0.1.0\"" > /var/software_versions.txt - - type: nextflow -``` - -### Step 3: Find container - -Google `biocontainer xxx` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`. - -### Step 4: Create help file - -```bash -docker run --rm -it -v `pwd`/src/xxx/:/xxx quay.io/biocontainers/xxx:tag -xxx --help > /xxx/help.txt -``` -