Skip to content

Commit

Permalink
extend contributing guidelines
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Feb 2, 2024
1 parent 8c8ad10 commit 087526b
Showing 1 changed file with 286 additions and 63 deletions.
349 changes: 286 additions & 63 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,292 @@ We encourage contributions from the community. To contribute:
2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
3. **Submit a Pull Request**: After testing your component, submit a pull request for review.

## Procedure of adding a component

### Step 1: Find a component to contribute

* Find a tool to contribute to this repo.

* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).

* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.

* Create an issue to show that you are working on this component.


### Step 2: Add config template

Change all occurrences of `xxx` to the name of the component.

Create a file at `src/xxx/config.vsh.yaml` with contents:

```yaml
functionality:
name: xxx
description: xxx
keywords: [tag1, tag2]
links:
homepage: yyy
documentation: yyy
repository: yyy
references:
doi: 12345/12345678.yz
license: MIT/Apache-2.0/GPL-3.0/...
argument_groups:
- name: Inputs
arguments: <...>
- name: Outputs
arguments: <...>
- name: Arguments
arguments: <...>
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- <...>
runners:
- type: executable
- type: nextflow
```
### Step 3: Fill in the metadata
Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
```yaml
functionality:
name: arriba
description: Detect gene fusions from RNA-Seq data
keywords: [Gene fusion, RNA-Seq]
links:
homepage: https://arriba.readthedocs.io/en/latest/
documentation: https://arriba.readthedocs.io/en/latest/
repository: https://github.com/suhrig/arriba
references:
doi: 10.1101/gr.257246.119
bibtex: |
@article{
... a bibtex entry in case the doi is not available ...
}
license: MIT
```
### Step 4: Find a suitable container
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.

If no such container is found, you can create a custom container in the next step.


### Step 5: Create help file

To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.

````bash
cat <<EOF > src/xxx/help.txt
```sh
xxx --help
```
EOF

docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
````

Notes:

* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.

* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`:

```bash
docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h
```

### Step 6: Add arguments for the input files

By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component.

For instance, in the [arriba help file](src/arriba/help.txt), we see the following:

Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-o fusions.tsv [-O fusions.discarded.tsv] \
[OPTIONS]

-x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR
(Aligned.out.sam). Arriba extracts candidate reads from this file.

Based on this information, we can add the following input arguments to the config file.

```yaml
argument_groups:
- name: Inputs
arguments:
- name: --bam
alternatives: -x
type: file
description: |
File in SAM/BAM/CRAM format with main alignments as generated by STAR
(Aligned.out.sam). Arriba extracts candidate reads from this file.
required: true
example: Aligned.out.bam
```

Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments.

Several notes:

* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`.

* Input arguments can have `multiple: true` to allow the user to specify multiple files.



### Step 7: Add arguments for the output files

By looking at the help file, we now also add output arguments to the config file.

For example, in the [arriba help file](src/arriba/help.txt), we see the following:


Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-o fusions.tsv [-O fusions.discarded.tsv] \
[OPTIONS]

-o FILE Output file with fusions that have passed all filters.

-O FILE Output file with fusions that were discarded due to filtering.

Based on this information, we can add the following output arguments to the config file.

```yaml
argument_groups:
- name: Outputs
arguments:
- name: --fusions
alternatives: -o
type: file
direction: output
description: |
Output file with fusions that have passed all filters.
required: true
example: fusions.tsv
- name: --fusions_discarded
alternatives: -O
type: file
direction: output
description: |
Output file with fusions that were discarded due to filtering.
required: false
example: fusions.discarded.tsv
```

Note:

* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).

### Step 8: Add arguments for the other arguments

Finally, add all other arguments to the config file. There are a few exceptions:

* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file.

* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.


### Step 9: Add a Docker engine

To ensure reproducibility of components, we require that all components are run in a Docker container.

```yaml
engines:
- type: docker
image: quay.io/biocontainers/xxx:0.1.0--py_0
```

If you didn't find a suitable container in the previous step, you can create a custom container. For example:

```yaml
engines:
- type: docker
image: python:3.10
setup:
- type: python
packages: numpy
```

For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform).

Here is a list of base containers we can recommend:

* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu)
* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script)
* JavaScript: [`node`](https://hub.docker.com/_/node)
* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse)
* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt)

### Step 10: Write a runner script

Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments.

```bash
#!/bin/bash
## VIASH START
## VIASH END
xxx \
--input "$par_input" \
--output "$par_output" \
$([ "$par_option" = "true" ] && echo "--option")
```

When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.

As an example, this is what the Bash script for the `arriba` component looks like:

```bash
#!/bin/bash
## VIASH START
## VIASH END
arriba \
-x "$par_bam" \
-a "$par_genome" \
-g "$par_gene_annotation" \
-o "$par_fusions" \
${par_known_fusions:+-k "${par_known_fusions}"} \
${par_blacklist:+-b "${par_blacklist}"} \
${par_structural_variants:+-d "${par_structural_variants}"} \
$([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
$([ "$par_extra_information" = "true" ] && echo "-X") \
$([ "$par_fill_gaps" = "true" ] && echo "-I")
```

### Step 11: Add a test script

### Step 12: Create a `/var/software_versions.txt` file

```yaml
engines:
- type: docker
image: quay.io/biocontainers/xxx:0.1.0--py_0
setup:
- type: docker
run: |
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
```

## Documentation of Functionality

The purpose and functionality of each component should be adequately described.
Expand Down Expand Up @@ -171,66 +457,3 @@ functionality:
description: "Which normalization was used"
required: true
```

## Workflow

### Step 1: Find a component to contribute

* Find a tool to contribute to this repo
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1)
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration
* Add the component to the Project board to show that you are working on it

### Step 2: Add config template

Change all occurrences of `xxx` to the name of the component.

Contents of `src/xxx/config.vsh.yaml`:

```yaml
functionality:
name: xxx
description: xxx
info:
keywords: [tag1, tag2]
homepage: yyy
documentation: yyy
repository: yyy
reference: "doi:yyy"
licence: yyy
argument_groups:
- name: Inputs
arguments:
- name: Outputs
arguments:
- name: Arguments
arguments:
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
platforms:
- type: docker
image: quay.io/biocontainers/xxx:0.1.0--py_0
setup:
- type: docker
run: |
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
- type: nextflow
```

### Step 3: Find container

Google `biocontainer xxx` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.

### Step 4: Create help file

```bash
docker run --rm -it -v `pwd`/src/xxx/:/xxx quay.io/biocontainers/xxx:tag
xxx --help > /xxx/help.txt
```

0 comments on commit 087526b

Please sign in to comment.