viash-io · Grifs · Aug 14, 2023 · Aug 10, 2023 · Aug 14, 2023
diff --git a/guide/nextflow_vdsl3/create-a-pipeline.qmd b/guide/nextflow_vdsl3/create-a-pipeline.qmd
@@ -1,6 +1,6 @@
 ---
 title: Create a pipeline
-order: 30
+order: 40
 ---
 
 {{< include ../../_includes/_clone_template.qmd >}}
@@ -52,6 +52,40 @@ Once everything is built, a new **target** directory has been created containing
 tree target
 ```
 
+
+## Importing a VDSL3 module
+
+After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. 
+
+**Example:**
+
+```groovy
+include { mymodule } from 'target/nextflow/mymodule/main.nf'
+```
+
+
+## VDSL3 module interface
+
+VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. 
+
+**Example:**
+
+```groovy
+workflow {
+ Channel.fromList([
+ ["myid", [input: file("in.txt")]]
+ ])
+ | mymodule
+}
+```
+
+:::{.callout-note}
+If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
+That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
+For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
+:::
+
+
 ## Create a pipeline
 
 Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).
@@ -92,26 +126,17 @@ HERE
 main.nf
 ```
 
-## VDSL3 module interface
-
-It's important to note what the interface of every VDSL3 module is. A VDSL3 module expects an input to be a tuple with the following elements:
+<!-- TODO: refactor using the new fromState/toState args -->
 
-* `id` (`String`): A unique identifier used for tracking data objects and for ensuring output filenames are unique.
-* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) used to pass the module's input arguments. If the module only has a 
- single input file, the file itself can simply be passed.
-* `...` (`Any*`): Any other elements in the tuple simply pass through the module without being altered in any way. For this reason, it is often referred to as the "passthrough" objects.
+## Customizing VDSL3 modules on the fly
 
-In turn, a VDSL3 module will return a tuple with the same interface, except that the input data object has been replaced with the output data:
+Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
 
-* `id` (`String`): The identifier from the input tuple.
-* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) containing the module's output files. **Important**: If the module only has a single output file, the file itself will be returned.
-* `...` (`Any*`): The passthrough objects from the input tuple (if any).
+The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
 
-## What is `.run()`?
+See the [reference documentation](/reference/nextflow_vdsl3/import_module.qmd#customizing-vdsl3-modules-on-the-fly) for a complete list of arguments of `.run()`.
 
-Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
 
-The `run()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. In this case, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
 
 ## Run the pipeline
 

diff --git a/guide/nextflow_vdsl3/create-a-module.qmd → ...extflow_vdsl3/create-and-use-a-module.qmd b/guide/nextflow_vdsl3/create-a-module.qmd → ...extflow_vdsl3/create-and-use-a-module.qmd
@@ -1,6 +1,6 @@
 ---
-title: Create a module
-order: 20
+title: Create and use a module
+order: 10
 ---
 
 
@@ -52,45 +52,41 @@ pwalk(langs, function(id, label, example_config, ...) {
 ```
 :::
 
-## Build the VDSL3 module
+## Generating a VDSL3 module
 
 We will now turn the Viash component into a VDSL3 module. By default, the `viash build` command will select the first platform in the list of platforms. To select the `nextflow` platform, use the `--platform nextflow` argument, or `-p nextflow` for short.
 
-::: {.panel-tabset}
 ```{r viash-build-nxf}
 #| echo: false
 #| output: asis
-langs <- langs %>% filter(id == "bash")
-pwalk(langs, function(id, label, config_path, script_path, ...) {
- qrt(
- "## {% label %}
- |
- |```{bash build-example}
- |viash build config.vsh.yaml -o target -p nextflow
- |```
- |
- |This will generate a Nextflow module in the `target/` directory:
- |
- |```{bash view-tree}
- |tree target
- |```
- |", 
- .dir = paste0(temp_dir, "/", id)
- )
-})
+id <- "bash"
+qrt(
+ "```{bash build-example}
+ |viash build config.vsh.yaml -o target -p nextflow
+ |```
+ |
+ |This will generate a Nextflow module in the `target/` directory:
+ |
+ |```{bash view-tree}
+ |tree target
+ |```
+ |", 
+ .dir = paste0(temp_dir, "/", id)
+)
 ```
-:::
 
-This `main.nf` file is both a **standalone Nextflow pipeline** and a module which can be used as **part of another pipeline**.
+This `main.nf` file is both a [**standalone Nextflow pipeline**](run-a-module.qmd) and a module which can be imported as [**part of another pipeline**](import-a-module.qmd).
 
 :::{.callout-tip}
-You can also use the `viash ns build` command to build all of the platforms in one go. Give it a try! More information in the following section.
+In larger proejcts it's recommended to use the [`viash ns build`](/reference/cli/ns_build.qmd) command to [build all of the components](/guide/project/batch-processing.qmd) in one go. Give it a try!
 :::
 
-## Module as a standalone pipeline
+## Running a module as a standalone pipeline
 
-When VDSL3 modules are used as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter,
-as Nextflow will automatically choose the parameter names of the output files.
+
+Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.
+
+To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files.
 
 ```{r nextflow-run, echo=FALSE, output="asis"}
 id <- "bash"
@@ -177,33 +173,58 @@ Instead of a YAML, you can also pass a JSON or a CSV to the `--param_list`
 parameter.
 :::
 
+
 ## Module as part of a pipeline
 
 This module can also be used as part of a Nextflow pipeline.
 Below is a short preview of what this looks like.
 
 ```groovy
-import { example_bash } from "target/main.nf"
-
-Channel.fromList([
- ["sample1", file("sample1.txt")],
- ["sample2", file("sample2.txt")],
- ["sample3", file("sample3.txt")]
-])
- | view { it -> "input: $it" }
- | example_bash
- | view { it -> "output: $it" }
+include { mymodule1 } from 'target/nextflow/mymodule1/main.nf'
+include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'
+
+workflow {
+ Channel.fromList([
+ [
+ // a unique identifier for this tuple
+ "myid", 
+ // the state for this tuple
+ [
+ input: file("in.txt"),
+ module1_k: 10,
+ module2_k: 4
+ ]
+ ]
+ ])
+ | mymodule1.run(
+ // use a hashmap to define which part of the state is used to run mymodule1
+ fromState: [
+ input: "input",
+ k: "module1_k"
+ ],
+ // use a hashmap to define how the output of mymodule1 is stored back into the state
+ toState: [
+ module1_output: "output"
+ ]
+ )
+ | mymodule2.run(
+ // use a closure to define which data is used to run mymodule2
+ fromState: { id, state -> 
+ [
+ input: state.module1_output,
+ k: state.module2_k
+ ]
+ },
+ // use a closure to return only the output of module2 as a new state
+ toState: { id, output, state ->
+ output
+ },
+ auto: [
+ publish: true
+ ]
+ )
+}
 ```
 
 We will discuss building pipelines with VDSL3 modules in more detail in [Create a pipeline](create-a-pipeline.qmd).
 
-## Improvements over standard Nextflow modules
-
-* No need to write any Nextflow Groovy code, just your script and the Viash config.
-* VDSL3 module are also standalone pipelines.
-* Help documentation is automatically generated.
-* Standardized interface for passing parameter lists.
-* Automatically uses the Docker platform's container.
-
-
-{{< include ../../_includes/_prune_all_images.qmd >}}
diff --git a/guide/nextflow_vdsl3/index.qmd b/guide/nextflow_vdsl3/index.qmd
@@ -2,4 +2,24 @@
 title: Nextflow VDSL3
 order: 30
 hidden: true
----
+---
+
+Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development.
+
+Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing.
+
+You can use Viash to speed up or replace your pipeline development processes in the following steps:
+
+* [Use Viash to generate VDSL3 modules](create-and-use-a-module.qmd#build-the-vdsl3-module)
+* [Run a module as a standaline pipeline](create-and-use-a-module.qmd#running-a-module-as-a-standalone-pipeline)
+* [Import a VDSL3 module](create-and-use-a-module.qmd#module-as-part-of-a-pipeline)
+* [Create a Nextflow workflow](create-a-pipeline.qmd) using one or more modules
+
+
+## Improvements of VDSL3 modules over standard Nextflow modules
+
+* No need to write any Nextflow Groovy code, just your script and the Viash config.
+* VDSL3 module are also standalone pipelines.
+* Help documentation is automatically generated.
+* Standardized interface for passing parameter lists.
+* Automatically uses the Docker platform's container.
diff --git a/guide/nextflow_vdsl3/introduction.qmd b/guide/nextflow_vdsl3/introduction.qmd
diff --git a/reference/nextflow_vdsl3/import_module.qmd b/reference/nextflow_vdsl3/import_module.qmd
@@ -0,0 +1,107 @@
+---
+title: Import a VDSL3 module
+---
+
+A VDSL3 module is a Nextflow module generated by Viash. See the [guide](/guide/nextflow_vdsl3/introduction.qmd) for a more in-depth explanation on how to create Nextflow workflows with VDSL3 modules.
+
+## Importing a VDSL3 module
+
+
+After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. 
+
+**Example:**
+
+```groovy
+include { mymodule } from 'target/nextflow/mymodule/main.nf'
+```
+
+## VDSL3 module interface
+
+VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. 
+
+**Example:**
+
+```groovy
+workflow {
+ Channel.fromList([
+ ["myid", [input: file("in.txt")]]
+ ])
+ | mymodule
+}
+```
+
+:::{.callout-note}
+If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
+That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
+For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
+:::
+
+## Customizing VDSL3 modules on the fly
+
+Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
+
+The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
+
+**Example:**
+
+```groovy
+workflow {
+ Channel.fromList([
+ ["myid", [input: file("in.txt")]]
+ ])
+ | mymodule.run(
+ args: [k: 10],
+ directives: [cpus: 4, memory: "16 GB"]
+ )
+}
+```
+
+### Arguments of `.run()`
+
+- `key` (`String`): A unique key used to trace the process and help make names of output files unique. Default: the name of the Viash component.
+
+- `args` (`Map[String, Object]`): Argument overrides to be passed to the module.
+
+- `directives` (`Map[String, Object]`): Custom directives overrides. See the Nextflow documentation for a list of available directives.
+
+- `auto` (`Map[String, Boolean]`): Whether to apply certain automated processing steps. Default values are inherited from the [Viash config](/reference/config/platforms/nextflow/auto.qmd).
+
+- `auto.simplifyInput`: If `true`, if the input tuple is a single file and if the module only has a single input file, the input file will be passed the module accordingly. Default: `true` (inherited from Viash config).
+
+- `auto.simplifyOutput`: If `true`, if the output tuple is a single file and if the module only has a single output file, the output map will be transformed into a single file. Default: `true` (inherited from Viash config).
+
+- `auto.publish`: If `true`, the output files will be published to the `params.publishDir` folder. Default: `false` (inherited from Viash config).
+
+- `auto.transcript`: If `true`, the module's transcript will be published to the `params.transcriptDir` folder. Default: `false` (inherited from Viash config).
+
+- `map` (`Function`): Apply a map over the incoming tuple. Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`. Default: `null`.
+
+- `mapId` (`Function`): Apply a map over the ID element of a tuple (i.e. the first element). Example: `{ id -> id + "_foo" }`. Default: `null`.
+
+- `mapData` (`Function`): Apply a map over the data element of a tuple (i.e. the second element). Example: `{ data -> [ input: data.output ] }`. Default: `null`.
+
+- `mapPassthrough` (`Function`): Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements). Example: `{ pt -> pt.drop(1) }`. Default: `null`.
+
+- `filter` (`Function`): Filter the channel. Example: `{ tup -> tup[0] == "foo" }`. Default: `null`.
+
+- `fromState`: Fetch data from the state and pass it to the module without altering the current state. `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+
+ - If it is `null`, the state will be passed to the module as is.
+ - If it is a `List[String]`, the data will be the values of the state at the given keys.
+ - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+ - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+
+ Example: `{ id, state -> [input: state.fastq_file] }`
+ Default: `null`
+
+- `toState`: Determine how the state should be updated after the module has been run. `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+
+ - If it is `null`, the state will be replaced with the output of the module.
+ - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+ - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+ - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+
+ Example: `{ id, output, state -> state + [counts: state.output] }`
+ Default: `{ id, output, state -> output }`
+
+- `debug`: Whether or not to print debug messages. Default: `false`.
diff --git a/reference/nextflow_vdsl3/index.qmd b/reference/nextflow_vdsl3/index.qmd
@@ -0,0 +1,10 @@
+---
+title: Nextflow VDSL3
+order: 35
+---
+
+Viash supports creating Nextflow workflows in multiple ways.
+
+* [Run a module as a standaline pipeline](run_module.qmd)
+* [Import a VDSL3 module](import_module.qmd)
+* Create a Nextflow workflow with dependencies