extend contributing guidelines

viash-hub · Feb 2, 2024 · 087526b · 087526b
1 parent 8c8ad10
commit 087526b
Showing 1 changed file with 286 additions and 63 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -7,6 +7,292 @@ We encourage contributions from the community. To contribute:
 2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
 3. **Submit a Pull Request**: After testing your component, submit a pull request for review.
 
+## Procedure of adding a component
+
+### Step 1: Find a component to contribute
+
+* Find a tool to contribute to this repo.
+
+* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
+
+* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
+
+* Create an issue to show that you are working on this component.
+
+
+### Step 2: Add config template
+
+Change all occurrences of `xxx` to the name of the component.
+
+Create a file at `src/xxx/config.vsh.yaml` with contents:
+
+```yaml
+functionality:
+  name: xxx
+  description: xxx
+  keywords: [tag1, tag2]
+  links:
+    homepage: yyy
+    documentation: yyy
+    repository: yyy
+  references: 
+    doi: 12345/12345678.yz
+  license: MIT/Apache-2.0/GPL-3.0/...
+  argument_groups:
+    - name: Inputs
+      arguments: <...>
+    - name: Outputs
+      arguments: <...>
+    - name: Arguments
+      arguments: <...>
+  resources:
+    - type: bash_script
+      path: script.sh
+  test_resources:
+    - type: bash_script
+      path: test.sh
+    - type: file
+      path: test_data
+engines:
+  - <...>
+runners:
+  - type: executable
+  - type: nextflow
+```
+
+### Step 3: Fill in the metadata
+
+Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
+
+```yaml
+functionality:
+  name: arriba
+  description: Detect gene fusions from RNA-Seq data
+  keywords: [Gene fusion, RNA-Seq]
+  links:
+    homepage: https://arriba.readthedocs.io/en/latest/
+    documentation: https://arriba.readthedocs.io/en/latest/
+    repository: https://github.com/suhrig/arriba
+  references:
+    doi: 10.1101/gr.257246.119
+    bibtex: |
+      @article{
+        ... a bibtex entry in case the doi is not available ...
+      }
+  license: MIT
+```
+
+### Step 4: Find a suitable container
+
+Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
+
+If no such container is found, you can create a custom container in the next step. 
+
+
+### Step 5: Create help file
+
+To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
+
+````bash
+cat <<EOF > src/xxx/help.txt
+```sh
+xxx --help
+```
+EOF
+
+docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
+````
+
+Notes:
+
+* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
+
+* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`:
+
+  ```bash
+  docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h
+  ```
+
+### Step 6: Add arguments for the input files
+
+By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component.
+
+For instance, in the [arriba help file](src/arriba/help.txt), we see the following:
+
+    Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
+                  -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
+                  [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
+                  -o fusions.tsv [-O fusions.discarded.tsv] \
+                  [OPTIONS]
+
+    -x FILE  File in SAM/BAM/CRAM format with main alignments as generated by STAR 
+              (Aligned.out.sam). Arriba extracts candidate reads from this file. 
+
+Based on this information, we can add the following input arguments to the config file.
+
+```yaml
+argument_groups:
+  - name: Inputs
+    arguments:
+    - name: --bam
+      alternatives: -x
+      type: file
+      description: |
+        File in SAM/BAM/CRAM format with main alignments as generated by STAR
+        (Aligned.out.sam). Arriba extracts candidate reads from this file.
+      required: true
+      example: Aligned.out.bam
+```
+
+Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments.
+
+Several notes:
+
+* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`.
+
+* Input arguments can have `multiple: true` to allow the user to specify multiple files.
+
+
+
+### Step 7: Add arguments for the output files
+
+By looking at the help file, we now also add output arguments to the config file.
+
+For example, in the [arriba help file](src/arriba/help.txt), we see the following:
+
+
+    Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
+                  -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
+                  [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
+                  -o fusions.tsv [-O fusions.discarded.tsv] \
+                  [OPTIONS]
+
+     -o FILE  Output file with fusions that have passed all filters. 
+
+     -O FILE  Output file with fusions that were discarded due to filtering. 
+
+Based on this information, we can add the following output arguments to the config file.
+
+```yaml
+argument_groups:
+  - name: Outputs
+    arguments:
+      - name: --fusions
+        alternatives: -o
+        type: file
+        direction: output
+        description: |
+          Output file with fusions that have passed all filters.
+        required: true
+        example: fusions.tsv
+      - name: --fusions_discarded
+        alternatives: -O
+        type: file
+        direction: output
+        description: |
+          Output file with fusions that were discarded due to filtering. 
+        required: false
+        example: fusions.discarded.tsv
+```
+
+Note: 
+
+* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
+
+### Step 8: Add arguments for the other arguments
+
+Finally, add all other arguments to the config file. There are a few exceptions:
+
+* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file.
+
+* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.
+
+
+### Step 9: Add a Docker engine
+
+To ensure reproducibility of components, we require that all components are run in a Docker container. 
+
+```yaml
+engines:
+  - type: docker
+    image: quay.io/biocontainers/xxx:0.1.0--py_0
+```
+
+If you didn't find a suitable container in the previous step, you can create a custom container. For example:
+
+```yaml
+engines:
+  - type: docker
+    image: python:3.10
+    setup:
+      - type: python
+        packages: numpy
+```
+
+For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform).
+
+Here is a list of base containers we can recommend:
+
+* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu)
+* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script)
+* JavaScript: [`node`](https://hub.docker.com/_/node)
+* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
+* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse)
+* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt)
+
+### Step 10: Write a runner script
+
+Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments.
+
+```bash
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+xxx \
+  --input "$par_input" \
+  --output "$par_output" \
+  $([ "$par_option" = "true" ] && echo "--option")
+```
+
+When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
+
+As an example, this is what the Bash script for the `arriba` component looks like:
+
+```bash
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+arriba \
+  -x "$par_bam" \
+  -a "$par_genome" \
+  -g "$par_gene_annotation" \
+  -o "$par_fusions" \
+  ${par_known_fusions:+-k "${par_known_fusions}"} \
+  ${par_blacklist:+-b "${par_blacklist}"} \
+  ${par_structural_variants:+-d "${par_structural_variants}"} \
+  $([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
+  $([ "$par_extra_information" = "true" ] && echo "-X") \
+  $([ "$par_fill_gaps" = "true" ] && echo "-I")
+```
+
+### Step 11: Add a test script
+
+### Step 12: Create a `/var/software_versions.txt` file
+
+```yaml
+engines:
+  - type: docker
+    image: quay.io/biocontainers/xxx:0.1.0--py_0
+    setup:
+      - type: docker
+        run: |
+          echo "xxx: \"0.1.0\"" > /var/software_versions.txt
+```
+
 ## Documentation of Functionality
 
 The purpose and functionality of each component should be adequately described.
@@ -171,66 +457,3 @@ functionality:
               description: "Which normalization was used"
               required: true
 ```
-
-## Workflow
-
-### Step 1: Find a component to contribute
-
-* Find a tool to contribute to this repo
-* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1)
-* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration
-* Add the component to the Project board to show that you are working on it
-
-### Step 2: Add config template
-
-Change all occurrences of `xxx` to the name of the component.
-
-Contents of `src/xxx/config.vsh.yaml`:
-
-```yaml
-functionality:
-  name: xxx
-  description: xxx
-  info:
-    keywords: [tag1, tag2]
-    homepage: yyy
-    documentation: yyy
-    repository: yyy
-    reference: "doi:yyy"
-    licence: yyy
-  argument_groups:
-    - name: Inputs
-      arguments:
-    - name: Outputs
-      arguments:
-    - name: Arguments
-      arguments:
-  resources:
-    - type: bash_script
-      path: script.sh
-  test_resources:
-    - type: bash_script
-      path: test.sh
-    - type: file
-      path: test_data
-platforms:
-  - type: docker
-    image: quay.io/biocontainers/xxx:0.1.0--py_0
-    setup:
-      - type: docker
-        run: |
-          echo "xxx: \"0.1.0\"" > /var/software_versions.txt
-  - type: nextflow
-```
-
-### Step 3: Find container
-
-Google `biocontainer xxx` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
-
-### Step 4: Create help file
-
-```bash
-docker run --rm -it -v `pwd`/src/xxx/:/xxx quay.io/biocontainers/xxx:tag
-xxx --help > /xxx/help.txt
-```
-