Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor guide, add vdsl3 reference docs #60

Merged
merged 2 commits into from
Aug 14, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 40 additions & 15 deletions guide/nextflow_vdsl3/create-a-pipeline.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Create a pipeline
order: 30
order: 40
---

{{< include ../../_includes/_clone_template.qmd >}}
Expand Down Expand Up @@ -52,6 +52,40 @@ Once everything is built, a new **target** directory has been created containing
tree target
```


## Importing a VDSL3 module

After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module.

**Example:**

```groovy
include { mymodule } from 'target/nextflow/mymodule/main.nf'
```


## VDSL3 module interface

VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`.

**Example:**

```groovy
workflow {
Channel.fromList([
["myid", [input: file("in.txt")]]
])
| mymodule
}
```

:::{.callout-note}
If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
:::


## Create a pipeline

Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).
Expand Down Expand Up @@ -92,26 +126,17 @@ HERE
main.nf
```

## VDSL3 module interface

It's important to note what the interface of every VDSL3 module is. A VDSL3 module expects an input to be a tuple with the following elements:
<!-- TODO: refactor using the new fromState/toState args -->

* `id` (`String`): A unique identifier used for tracking data objects and for ensuring output filenames are unique.
* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) used to pass the module's input arguments. If the module only has a
single input file, the file itself can simply be passed.
* `...` (`Any*`): Any other elements in the tuple simply pass through the module without being altered in any way. For this reason, it is often referred to as the "passthrough" objects.
## Customizing VDSL3 modules on the fly

In turn, a VDSL3 module will return a tuple with the same interface, except that the input data object has been replaced with the output data:
Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.

* `id` (`String`): The identifier from the input tuple.
* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) containing the module's output files. **Important**: If the module only has a single output file, the file itself will be returned.
* `...` (`Any*`): The passthrough objects from the input tuple (if any).
The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.

## What is `.run()`?
See the [reference documentation](/reference/nextflow_vdsl3/import_module.qmd#customizing-vdsl3-modules-on-the-fly) for a complete list of arguments of `.run()`.

Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.

The `run()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. In this case, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Create a module
order: 20
title: Create and use a module
order: 10
---


Expand Down Expand Up @@ -52,45 +52,41 @@ pwalk(langs, function(id, label, example_config, ...) {
```
:::

## Build the VDSL3 module
## Generating a VDSL3 module

We will now turn the Viash component into a VDSL3 module. By default, the `viash build` command will select the first platform in the list of platforms. To select the `nextflow` platform, use the `--platform nextflow` argument, or `-p nextflow` for short.

::: {.panel-tabset}
```{r viash-build-nxf}
#| echo: false
#| output: asis
langs <- langs %>% filter(id == "bash")
pwalk(langs, function(id, label, config_path, script_path, ...) {
qrt(
"## {% label %}
|
|```{bash build-example}
|viash build config.vsh.yaml -o target -p nextflow
|```
|
|This will generate a Nextflow module in the `target/` directory:
|
|```{bash view-tree}
|tree target
|```
|",
.dir = paste0(temp_dir, "/", id)
)
})
id <- "bash"
qrt(
"```{bash build-example}
|viash build config.vsh.yaml -o target -p nextflow
|```
|
|This will generate a Nextflow module in the `target/` directory:
|
|```{bash view-tree}
|tree target
|```
|",
.dir = paste0(temp_dir, "/", id)
)
```
:::

This `main.nf` file is both a **standalone Nextflow pipeline** and a module which can be used as **part of another pipeline**.
This `main.nf` file is both a [**standalone Nextflow pipeline**](run-a-module.qmd) and a module which can be imported as [**part of another pipeline**](import-a-module.qmd).

:::{.callout-tip}
You can also use the `viash ns build` command to build all of the platforms in one go. Give it a try! More information in the following section.
In larger proejcts it's recommended to use the [`viash ns build`](/reference/cli/ns_build.qmd) command to [build all of the components](/guide/project/batch-processing.qmd) in one go. Give it a try!
:::

## Module as a standalone pipeline
## Running a module as a standalone pipeline

When VDSL3 modules are used as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter,
as Nextflow will automatically choose the parameter names of the output files.

Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.

To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files.

```{r nextflow-run, echo=FALSE, output="asis"}
id <- "bash"
Expand Down Expand Up @@ -177,33 +173,58 @@ Instead of a YAML, you can also pass a JSON or a CSV to the `--param_list`
parameter.
:::


## Module as part of a pipeline

This module can also be used as part of a Nextflow pipeline.
Below is a short preview of what this looks like.

```groovy
import { example_bash } from "target/main.nf"

Channel.fromList([
["sample1", file("sample1.txt")],
["sample2", file("sample2.txt")],
["sample3", file("sample3.txt")]
])
| view { it -> "input: $it" }
| example_bash
| view { it -> "output: $it" }
include { mymodule1 } from 'target/nextflow/mymodule1/main.nf'
include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'

workflow {
Channel.fromList([
[
// a unique identifier for this tuple
"myid",
// the state for this tuple
[
input: file("in.txt"),
module1_k: 10,
module2_k: 4
]
]
])
| mymodule1.run(
// use a hashmap to define which part of the state is used to run mymodule1
fromState: [
input: "input",
k: "module1_k"
],
// use a hashmap to define how the output of mymodule1 is stored back into the state
toState: [
module1_output: "output"
]
)
| mymodule2.run(
// use a closure to define which data is used to run mymodule2
fromState: { id, state ->
[
input: state.module1_output,
k: state.module2_k
]
},
// use a closure to return only the output of module2 as a new state
toState: { id, output, state ->
output
},
auto: [
publish: true
]
)
}
```

We will discuss building pipelines with VDSL3 modules in more detail in [Create a pipeline](create-a-pipeline.qmd).

## Improvements over standard Nextflow modules

* No need to write any Nextflow Groovy code, just your script and the Viash config.
* VDSL3 module are also standalone pipelines.
* Help documentation is automatically generated.
* Standardized interface for passing parameter lists.
* Automatically uses the Docker platform's container.


{{< include ../../_includes/_prune_all_images.qmd >}}
22 changes: 21 additions & 1 deletion guide/nextflow_vdsl3/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,24 @@
title: Nextflow VDSL3
order: 30
hidden: true
---
---

Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development.

Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing.

You can use Viash to speed up or replace your pipeline development processes in the following steps:

* [Use Viash to generate VDSL3 modules](create-and-use-a-module.qmd#build-the-vdsl3-module)
* [Run a module as a standaline pipeline](create-and-use-a-module.qmd#running-a-module-as-a-standalone-pipeline)
* [Import a VDSL3 module](create-and-use-a-module.qmd#module-as-part-of-a-pipeline)
* [Create a Nextflow workflow](create-a-pipeline.qmd) using one or more modules


## Improvements of VDSL3 modules over standard Nextflow modules

* No need to write any Nextflow Groovy code, just your script and the Viash config.
* VDSL3 module are also standalone pipelines.
* Help documentation is automatically generated.
* Standardized interface for passing parameter lists.
* Automatically uses the Docker platform's container.
11 changes: 0 additions & 11 deletions guide/nextflow_vdsl3/introduction.qmd

This file was deleted.

107 changes: 107 additions & 0 deletions reference/nextflow_vdsl3/import_module.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: Import a VDSL3 module
---

A VDSL3 module is a Nextflow module generated by Viash. See the [guide](/guide/nextflow_vdsl3/introduction.qmd) for a more in-depth explanation on how to create Nextflow workflows with VDSL3 modules.

## Importing a VDSL3 module


After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module.

**Example:**

```groovy
include { mymodule } from 'target/nextflow/mymodule/main.nf'
```

## VDSL3 module interface

VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`.

**Example:**

```groovy
workflow {
Channel.fromList([
["myid", [input: file("in.txt")]]
])
| mymodule
}
```

:::{.callout-note}
If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
:::

## Customizing VDSL3 modules on the fly

Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.

The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.

**Example:**

```groovy
workflow {
Channel.fromList([
["myid", [input: file("in.txt")]]
])
| mymodule.run(
args: [k: 10],
directives: [cpus: 4, memory: "16 GB"]
)
}
```

### Arguments of `.run()`

- `key` (`String`): A unique key used to trace the process and help make names of output files unique. Default: the name of the Viash component.

- `args` (`Map[String, Object]`): Argument overrides to be passed to the module.

- `directives` (`Map[String, Object]`): Custom directives overrides. See the Nextflow documentation for a list of available directives.

- `auto` (`Map[String, Boolean]`): Whether to apply certain automated processing steps. Default values are inherited from the [Viash config](/reference/config/platforms/nextflow/auto.qmd).

- `auto.simplifyInput`: If `true`, if the input tuple is a single file and if the module only has a single input file, the input file will be passed the module accordingly. Default: `true` (inherited from Viash config).

- `auto.simplifyOutput`: If `true`, if the output tuple is a single file and if the module only has a single output file, the output map will be transformed into a single file. Default: `true` (inherited from Viash config).

- `auto.publish`: If `true`, the output files will be published to the `params.publishDir` folder. Default: `false` (inherited from Viash config).

- `auto.transcript`: If `true`, the module's transcript will be published to the `params.transcriptDir` folder. Default: `false` (inherited from Viash config).

- `map` (`Function`): Apply a map over the incoming tuple. Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`. Default: `null`.

- `mapId` (`Function`): Apply a map over the ID element of a tuple (i.e. the first element). Example: `{ id -> id + "_foo" }`. Default: `null`.

- `mapData` (`Function`): Apply a map over the data element of a tuple (i.e. the second element). Example: `{ data -> [ input: data.output ] }`. Default: `null`.

- `mapPassthrough` (`Function`): Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements). Example: `{ pt -> pt.drop(1) }`. Default: `null`.

- `filter` (`Function`): Filter the channel. Example: `{ tup -> tup[0] == "foo" }`. Default: `null`.

- `fromState`: Fetch data from the state and pass it to the module without altering the current state. `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function.

- If it is `null`, the state will be passed to the module as is.
- If it is a `List[String]`, the data will be the values of the state at the given keys.
- If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
- If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.

Example: `{ id, state -> [input: state.fastq_file] }`
Default: `null`

- `toState`: Determine how the state should be updated after the module has been run. `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.

- If it is `null`, the state will be replaced with the output of the module.
- If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
- If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
- If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.

Example: `{ id, output, state -> state + [counts: state.output] }`
Default: `{ id, output, state -> output }`

- `debug`: Whether or not to print debug messages. Default: `false`.
10 changes: 10 additions & 0 deletions reference/nextflow_vdsl3/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Nextflow VDSL3
order: 35
---

Viash supports creating Nextflow workflows in multiple ways.

* [Run a module as a standaline pipeline](run_module.qmd)
* [Import a VDSL3 module](import_module.qmd)
* Create a Nextflow workflow with dependencies
Loading
Loading