From 25c1ff3b2d5f59bafebb7b3db2898403a37d879b Mon Sep 17 00:00:00 2001
From: Robrecht Cannoodt <rcannood@gmail.com>
Date: Thu, 10 Aug 2023 21:37:14 +0200
Subject: [PATCH] refactor guide, add vdsl3 reference docs

---
 guide/nextflow_vdsl3/create-a-pipeline.qmd    |  55 +++++---
 ...module.qmd => create-and-use-a-module.qmd} | 117 +++++++++++-------
 guide/nextflow_vdsl3/index.qmd                |  22 +++-
 guide/nextflow_vdsl3/introduction.qmd         |  11 --
 reference/nextflow_vdsl3/import_module.qmd    | 107 ++++++++++++++++
 reference/nextflow_vdsl3/index.qmd            |  10 ++
 reference/nextflow_vdsl3/run_module.qmd       |  63 ++++++++++
 7 files changed, 310 insertions(+), 75 deletions(-)
 rename guide/nextflow_vdsl3/{create-a-module.qmd => create-and-use-a-module.qmd} (65%)
 delete mode 100644 guide/nextflow_vdsl3/introduction.qmd
 create mode 100644 reference/nextflow_vdsl3/import_module.qmd
 create mode 100644 reference/nextflow_vdsl3/index.qmd
 create mode 100644 reference/nextflow_vdsl3/run_module.qmd

diff --git a/guide/nextflow_vdsl3/create-a-pipeline.qmd b/guide/nextflow_vdsl3/create-a-pipeline.qmd
index a03f9bd0..e8c2ec51 100644
--- a/guide/nextflow_vdsl3/create-a-pipeline.qmd
+++ b/guide/nextflow_vdsl3/create-a-pipeline.qmd
@@ -1,6 +1,6 @@
 ---
 title: Create a pipeline
-order: 30
+order: 40
 ---
 
 {{< include ../../_includes/_clone_template.qmd >}}
@@ -52,6 +52,40 @@ Once everything is built, a new **target** directory has been created containing
 tree target
 ```
 
+
+## Importing a VDSL3 module
+
+After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. 
+
+**Example:**
+
+```groovy
+include { mymodule } from 'target/nextflow/mymodule/main.nf'
+```
+
+
+## VDSL3 module interface
+
+VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. 
+
+**Example:**
+
+```groovy
+workflow {
+  Channel.fromList([
+    ["myid", [input: file("in.txt")]]
+  ])
+    | mymodule
+}
+```
+
+:::{.callout-note}
+If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
+That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
+For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
+:::
+
+
 ## Create a pipeline
 
 Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).
@@ -92,26 +126,17 @@ HERE
 main.nf
 ```
 
-## VDSL3 module interface
-
-It's important to note what the interface of every VDSL3 module is. A VDSL3 module expects an input to be a tuple with the following elements:
+<!-- TODO: refactor using the new fromState/toState args -->
 
-* `id` (`String`): A unique identifier used for tracking data objects and for ensuring output filenames are unique.
-* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) used to pass the module's input arguments. If the module only has a 
-  single input file, the file itself can simply be passed.
-* `...` (`Any*`): Any other elements in the tuple simply pass through the module without being altered in any way. For this reason, it is often referred to as the "passthrough" objects.
+## Customizing VDSL3 modules on the fly
 
-In turn, a VDSL3 module will return a tuple with the same interface, except that the input data object has been replaced with the output data:
+Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
 
-* `id` (`String`): The identifier from the input tuple.
-* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) containing the module's output files. **Important**: If the module only has a single output file, the file itself will be returned.
-* `...` (`Any*`): The passthrough objects from the input tuple (if any).
+The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
 
-## What is `.run()`?
+See the [reference documentation](/reference/nextflow_vdsl3/import_module.qmd#customizing-vdsl3-modules-on-the-fly) for a complete list of arguments of `.run()`.
 
-Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
 
-The `run()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. In this case, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
 
 ## Run the pipeline
 
diff --git a/guide/nextflow_vdsl3/create-a-module.qmd b/guide/nextflow_vdsl3/create-and-use-a-module.qmd
similarity index 65%
rename from guide/nextflow_vdsl3/create-a-module.qmd
rename to guide/nextflow_vdsl3/create-and-use-a-module.qmd
index 99b11bb8..f8bf35ec 100644
--- a/guide/nextflow_vdsl3/create-a-module.qmd
+++ b/guide/nextflow_vdsl3/create-and-use-a-module.qmd
@@ -1,6 +1,6 @@
 ---
-title: Create a module
-order: 20
+title: Create and use a module
+order: 10
 ---
 
 
@@ -52,45 +52,41 @@ pwalk(langs, function(id, label, example_config, ...) {
 ```
 :::
 
-## Build the VDSL3 module
+## Generating a VDSL3 module
 
 We will now turn the Viash component into a VDSL3 module. By default, the `viash build` command will select the first platform in the list of platforms. To select the `nextflow` platform, use the `--platform nextflow` argument, or `-p nextflow` for short.
 
-::: {.panel-tabset}
 ```{r viash-build-nxf}
 #| echo: false
 #| output: asis
-langs <- langs %>% filter(id == "bash")
-pwalk(langs, function(id, label, config_path, script_path, ...) {
-  qrt(
-    "## {% label %}
-    |
-    |```{bash build-example}
-    |viash build config.vsh.yaml -o target -p nextflow
-    |```
-    |
-    |This will generate a Nextflow module in the `target/` directory:
-    |
-    |```{bash view-tree}
-    |tree target
-    |```
-    |", 
-    .dir = paste0(temp_dir, "/", id)
-  )
-})
+id <- "bash"
+qrt(
+  "```{bash build-example}
+  |viash build config.vsh.yaml -o target -p nextflow
+  |```
+  |
+  |This will generate a Nextflow module in the `target/` directory:
+  |
+  |```{bash view-tree}
+  |tree target
+  |```
+  |", 
+  .dir = paste0(temp_dir, "/", id)
+)
 ```
-:::
 
-This `main.nf` file is both a **standalone Nextflow pipeline** and a module which can be used as **part of another pipeline**.
+This `main.nf` file is both a [**standalone Nextflow pipeline**](run-a-module.qmd) and a module which can be imported as [**part of another pipeline**](import-a-module.qmd).
 
 :::{.callout-tip}
-You can also use the `viash ns build` command to build all of the platforms in one go. Give it a try! More information in the following section.
+In larger proejcts it's recommended to use the [`viash ns build`](/reference/cli/ns_build.qmd) command to [build all of the components](/guide/project/batch-processing.qmd) in one go. Give it a try!
 :::
 
-## Module as a standalone pipeline
+## Running a module as a standalone pipeline
 
-When VDSL3 modules are used as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter,
-as Nextflow will automatically choose the parameter names of the output files.
+
+Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.
+
+To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files.
 
 ```{r nextflow-run, echo=FALSE, output="asis"}
 id <- "bash"
@@ -177,33 +173,58 @@ Instead of a YAML, you can also pass a JSON or a CSV to the `--param_list`
 parameter.
 :::
 
+
 ## Module as part of a pipeline
 
 This module can also be used as part of a Nextflow pipeline.
 Below is a short preview of what this looks like.
 
 ```groovy
-import { example_bash } from "target/main.nf"
-
-Channel.fromList([
-  ["sample1", file("sample1.txt")],
-  ["sample2", file("sample2.txt")],
-  ["sample3", file("sample3.txt")]
-])
-  | view { it -> "input: $it" }
-  | example_bash
-  | view { it -> "output: $it" }
+include { mymodule1 } from 'target/nextflow/mymodule1/main.nf'
+include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'
+
+workflow {
+  Channel.fromList([
+    [
+      // a unique identifier for this tuple
+      "myid", 
+      // the state for this tuple
+      [
+        input: file("in.txt"),
+        module1_k: 10,
+        module2_k: 4
+      ]
+    ]
+  ])
+    | mymodule1.run(
+      // use a hashmap to define which part of the state is used to run mymodule1
+      fromState: [
+        input: "input",
+        k: "module1_k"
+      ],
+      // use a hashmap to define how the output of mymodule1 is stored back into the state
+      toState: [
+        module1_output: "output"
+      ]
+    )
+    | mymodule2.run(
+      // use a closure to define which data is used to run mymodule2
+      fromState: { id, state -> 
+        [
+          input: state.module1_output,
+          k: state.module2_k
+        ]
+      },
+      // use a closure to return only the output of module2 as a new state
+      toState: { id, output, state ->
+        output
+      },
+      auto: [
+        publish: true
+      ]
+    )
+}
 ```
 
 We will discuss building pipelines with VDSL3 modules in more detail in [Create a pipeline](create-a-pipeline.qmd).
 
-## Improvements over standard Nextflow modules
-
-* No need to write any Nextflow Groovy code, just your script and the Viash config.
-* VDSL3 module are also standalone pipelines.
-* Help documentation is automatically generated.
-* Standardized interface for passing parameter lists.
-* Automatically uses the Docker platform's container.
-
-
-{{< include ../../_includes/_prune_all_images.qmd >}}
\ No newline at end of file
diff --git a/guide/nextflow_vdsl3/index.qmd b/guide/nextflow_vdsl3/index.qmd
index 3ccaf1d7..24e19bc2 100644
--- a/guide/nextflow_vdsl3/index.qmd
+++ b/guide/nextflow_vdsl3/index.qmd
@@ -2,4 +2,24 @@
 title: Nextflow VDSL3
 order: 30
 hidden: true
----
\ No newline at end of file
+---
+
+Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development.
+
+Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing.
+
+You can use Viash to speed up or replace your pipeline development processes in the following steps:
+
+* [Use Viash to generate VDSL3 modules](create-and-use-a-module.qmd#build-the-vdsl3-module)
+* [Run a module as a standaline pipeline](create-and-use-a-module.qmd#running-a-module-as-a-standalone-pipeline)
+* [Import a VDSL3 module](create-and-use-a-module.qmd#module-as-part-of-a-pipeline)
+* [Create a Nextflow workflow](create-a-pipeline.qmd) using one or more modules
+
+
+## Improvements of VDSL3 modules over standard Nextflow modules
+
+* No need to write any Nextflow Groovy code, just your script and the Viash config.
+* VDSL3 module are also standalone pipelines.
+* Help documentation is automatically generated.
+* Standardized interface for passing parameter lists.
+* Automatically uses the Docker platform's container.
diff --git a/guide/nextflow_vdsl3/introduction.qmd b/guide/nextflow_vdsl3/introduction.qmd
deleted file mode 100644
index 547b14f2..00000000
--- a/guide/nextflow_vdsl3/introduction.qmd
+++ /dev/null
@@ -1,11 +0,0 @@
----
-title: Introduction
-description: What is VDSL3?
-order: 10
----
-
-Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development.
-
-Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing.
-
-In the following sections, we'll show how to use build Nextflow modules from Viash components and how to put them together in a pipeline.
\ No newline at end of file
diff --git a/reference/nextflow_vdsl3/import_module.qmd b/reference/nextflow_vdsl3/import_module.qmd
new file mode 100644
index 00000000..1f977b55
--- /dev/null
+++ b/reference/nextflow_vdsl3/import_module.qmd
@@ -0,0 +1,107 @@
+---
+title: Import a VDSL3 module
+---
+
+A VDSL3 module is a Nextflow module generated by Viash. See the [guide](/guide/nextflow_vdsl3/introduction.qmd) for a more in-depth explanation on how to create Nextflow workflows with VDSL3 modules.
+
+## Importing a VDSL3 module
+
+
+After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. 
+
+**Example:**
+
+```groovy
+include { mymodule } from 'target/nextflow/mymodule/main.nf'
+```
+
+## VDSL3 module interface
+
+VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. 
+
+**Example:**
+
+```groovy
+workflow {
+  Channel.fromList([
+    ["myid", [input: file("in.txt")]]
+  ])
+    | mymodule
+}
+```
+
+:::{.callout-note}
+If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple.
+That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module.
+For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`.
+:::
+
+## Customizing VDSL3 modules on the fly
+
+Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
+
+The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output.
+
+**Example:**
+
+```groovy
+workflow {
+  Channel.fromList([
+    ["myid", [input: file("in.txt")]]
+  ])
+    | mymodule.run(
+      args: [k: 10],
+      directives: [cpus: 4, memory: "16 GB"]
+    )
+}
+```
+
+### Arguments of `.run()`
+
+- `key` (`String`): A unique key used to trace the process and help make names of output files unique. Default: the name of the Viash component.
+
+- `args` (`Map[String, Object]`): Argument overrides to be passed to the module.
+
+- `directives` (`Map[String, Object]`): Custom directives overrides. See the Nextflow documentation for a list of available directives.
+
+- `auto` (`Map[String, Boolean]`): Whether to apply certain automated processing steps. Default values are inherited from the [Viash config](/reference/config/platforms/nextflow/auto.qmd).
+
+- `auto.simplifyInput`: If `true`, if the input tuple is a single file and if the module only has a single input file, the input file will be passed the module accordingly. Default: `true` (inherited from Viash config).
+
+- `auto.simplifyOutput`: If `true`, if the output tuple is a single file and if the module only has a single output file, the output map will be transformed into a single file. Default: `true` (inherited from Viash config).
+
+- `auto.publish`: If `true`, the output files will be published to the `params.publishDir` folder. Default: `false` (inherited from Viash config).
+
+- `auto.transcript`: If `true`, the module's transcript will be published to the `params.transcriptDir` folder. Default: `false` (inherited from Viash config).
+
+- `map` (`Function`): Apply a map over the incoming tuple. Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`. Default: `null`.
+
+- `mapId` (`Function`): Apply a map over the ID element of a tuple (i.e. the first element). Example: `{ id -> id + "_foo" }`. Default: `null`.
+
+- `mapData` (`Function`): Apply a map over the data element of a tuple (i.e. the second element). Example: `{ data -> [ input: data.output ] }`. Default: `null`.
+
+- `mapPassthrough` (`Function`): Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements). Example: `{ pt -> pt.drop(1) }`. Default: `null`.
+
+- `filter` (`Function`): Filter the channel. Example: `{ tup -> tup[0] == "foo" }`. Default: `null`.
+
+- `fromState`: Fetch data from the state and pass it to the module without altering the current state. `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  
+    - If it is `null`, the state will be passed to the module as is.
+    - If it is a `List[String]`, the data will be the values of the state at the given keys.
+    - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+    - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  
+  Example: `{ id, state -> [input: state.fastq_file] }`
+  Default: `null`
+
+- `toState`: Determine how the state should be updated after the module has been run. `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+
+    - If it is `null`, the state will be replaced with the output of the module.
+    - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+    - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+    - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  
+  Example: `{ id, output, state -> state + [counts: state.output] }`
+  Default: `{ id, output, state -> output }`
+
+- `debug`: Whether or not to print debug messages. Default: `false`.
diff --git a/reference/nextflow_vdsl3/index.qmd b/reference/nextflow_vdsl3/index.qmd
new file mode 100644
index 00000000..c33c9a06
--- /dev/null
+++ b/reference/nextflow_vdsl3/index.qmd
@@ -0,0 +1,10 @@
+---
+title: Nextflow VDSL3
+order: 35
+---
+
+Viash supports creating Nextflow workflows in multiple ways.
+
+* [Run a module as a standaline pipeline](run_module.qmd)
+* [Import a VDSL3 module](import_module.qmd)
+* Create a Nextflow workflow with dependencies
\ No newline at end of file
diff --git a/reference/nextflow_vdsl3/run_module.qmd b/reference/nextflow_vdsl3/run_module.qmd
new file mode 100644
index 00000000..648dd2b9
--- /dev/null
+++ b/reference/nextflow_vdsl3/run_module.qmd
@@ -0,0 +1,63 @@
+---
+title: Run a VDSL3 module
+---
+
+
+Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.
+
+To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files.
+
+## Viewing the help message
+
+More information regarding a modules arguments can be shown by passing the `--help`
+parameter.
+
+**Example:**
+
+```bash
+nextflow run target/nextflow/mycomponent/main.nf --help
+```
+
+
+## Running a module as a standalone pipeline
+You can run the executable by providing a value for each of the required arguments and `--publish_dir` (where output files are published).
+
+**Example:**
+
+```bash
+nextflow run target/nextflow/mycomponent/main.nf \
+  --input config.vsh.yaml \
+  --publish_dir output/
+```
+
+
+## Passing a parameter list
+
+Every VDSL3 can accept a list of parameters to populate a Nextflow channel with. Assuming we want to process a set of input files in parallel, we can create a yaml file `params.yaml` containing the following information.
+
+
+```yaml
+param_list:
+  - id: sample1
+    input: data/sample1.txt
+  - id: sample2
+    input: data/sample2.txt
+  - id: sample3
+    input: data/sample3.txt
+  - id: sample4
+    input: data/sample4.txt
+arg1: 10
+arg2: 5
+```
+
+You can run the pipeline on the list of parameters using the `-params-file`
+parameter.
+ 
+```{bash}
+nextflow run target/main.nf -params-file params.yaml --publish_dir output2
+```
+
+
+:::{.callout-tip}
+You can also pass a YAML, CSV or JSON file to the `param_list` parameter.
+:::
\ No newline at end of file