From 25c1ff3b2d5f59bafebb7b3db2898403a37d879b Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Thu, 10 Aug 2023 21:37:14 +0200 Subject: [PATCH] refactor guide, add vdsl3 reference docs --- guide/nextflow_vdsl3/create-a-pipeline.qmd | 55 +++++--- ...module.qmd => create-and-use-a-module.qmd} | 117 +++++++++++------- guide/nextflow_vdsl3/index.qmd | 22 +++- guide/nextflow_vdsl3/introduction.qmd | 11 -- reference/nextflow_vdsl3/import_module.qmd | 107 ++++++++++++++++ reference/nextflow_vdsl3/index.qmd | 10 ++ reference/nextflow_vdsl3/run_module.qmd | 63 ++++++++++ 7 files changed, 310 insertions(+), 75 deletions(-) rename guide/nextflow_vdsl3/{create-a-module.qmd => create-and-use-a-module.qmd} (65%) delete mode 100644 guide/nextflow_vdsl3/introduction.qmd create mode 100644 reference/nextflow_vdsl3/import_module.qmd create mode 100644 reference/nextflow_vdsl3/index.qmd create mode 100644 reference/nextflow_vdsl3/run_module.qmd diff --git a/guide/nextflow_vdsl3/create-a-pipeline.qmd b/guide/nextflow_vdsl3/create-a-pipeline.qmd index a03f9bd0..e8c2ec51 100644 --- a/guide/nextflow_vdsl3/create-a-pipeline.qmd +++ b/guide/nextflow_vdsl3/create-a-pipeline.qmd @@ -1,6 +1,6 @@ --- title: Create a pipeline -order: 30 +order: 40 --- {{< include ../../_includes/_clone_template.qmd >}} @@ -52,6 +52,40 @@ Once everything is built, a new **target** directory has been created containing tree target ``` + +## Importing a VDSL3 module + +After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. + +**Example:** + +```groovy +include { mymodule } from 'target/nextflow/mymodule/main.nf' +``` + + +## VDSL3 module interface + +VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. + +**Example:** + +```groovy +workflow { + Channel.fromList([ + ["myid", [input: file("in.txt")]] + ]) + | mymodule +} +``` + +:::{.callout-note} +If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple. +That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module. +For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`. +::: + + ## Create a pipeline Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2). @@ -92,26 +126,17 @@ HERE main.nf ``` -## VDSL3 module interface - -It's important to note what the interface of every VDSL3 module is. A VDSL3 module expects an input to be a tuple with the following elements: + -* `id` (`String`): A unique identifier used for tracking data objects and for ensuring output filenames are unique. -* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) used to pass the module's input arguments. If the module only has a - single input file, the file itself can simply be passed. -* `...` (`Any*`): Any other elements in the tuple simply pass through the module without being altered in any way. For this reason, it is often referred to as the "passthrough" objects. +## Customizing VDSL3 modules on the fly -In turn, a VDSL3 module will return a tuple with the same interface, except that the input data object has been replaced with the output data: +Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky. -* `id` (`String`): The identifier from the input tuple. -* `data` (`Map[String, Any]` or `File`): A named map (or dictionary) containing the module's output files. **Important**: If the module only has a single output file, the file itself will be returned. -* `...` (`Any*`): The passthrough objects from the input tuple (if any). +The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output. -## What is `.run()`? +See the [reference documentation](/reference/nextflow_vdsl3/import_module.qmd#customizing-vdsl3-modules-on-the-fly) for a complete list of arguments of `.run()`. -Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky. -The `run()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. In this case, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output. ## Run the pipeline diff --git a/guide/nextflow_vdsl3/create-a-module.qmd b/guide/nextflow_vdsl3/create-and-use-a-module.qmd similarity index 65% rename from guide/nextflow_vdsl3/create-a-module.qmd rename to guide/nextflow_vdsl3/create-and-use-a-module.qmd index 99b11bb8..f8bf35ec 100644 --- a/guide/nextflow_vdsl3/create-a-module.qmd +++ b/guide/nextflow_vdsl3/create-and-use-a-module.qmd @@ -1,6 +1,6 @@ --- -title: Create a module -order: 20 +title: Create and use a module +order: 10 --- @@ -52,45 +52,41 @@ pwalk(langs, function(id, label, example_config, ...) { ``` ::: -## Build the VDSL3 module +## Generating a VDSL3 module We will now turn the Viash component into a VDSL3 module. By default, the `viash build` command will select the first platform in the list of platforms. To select the `nextflow` platform, use the `--platform nextflow` argument, or `-p nextflow` for short. -::: {.panel-tabset} ```{r viash-build-nxf} #| echo: false #| output: asis -langs <- langs %>% filter(id == "bash") -pwalk(langs, function(id, label, config_path, script_path, ...) { - qrt( - "## {% label %} - | - |```{bash build-example} - |viash build config.vsh.yaml -o target -p nextflow - |``` - | - |This will generate a Nextflow module in the `target/` directory: - | - |```{bash view-tree} - |tree target - |``` - |", - .dir = paste0(temp_dir, "/", id) - ) -}) +id <- "bash" +qrt( + "```{bash build-example} + |viash build config.vsh.yaml -o target -p nextflow + |``` + | + |This will generate a Nextflow module in the `target/` directory: + | + |```{bash view-tree} + |tree target + |``` + |", + .dir = paste0(temp_dir, "/", id) +) ``` -::: -This `main.nf` file is both a **standalone Nextflow pipeline** and a module which can be used as **part of another pipeline**. +This `main.nf` file is both a [**standalone Nextflow pipeline**](run-a-module.qmd) and a module which can be imported as [**part of another pipeline**](import-a-module.qmd). :::{.callout-tip} -You can also use the `viash ns build` command to build all of the platforms in one go. Give it a try! More information in the following section. +In larger proejcts it's recommended to use the [`viash ns build`](/reference/cli/ns_build.qmd) command to [build all of the components](/guide/project/batch-processing.qmd) in one go. Give it a try! ::: -## Module as a standalone pipeline +## Running a module as a standalone pipeline -When VDSL3 modules are used as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, -as Nextflow will automatically choose the parameter names of the output files. + +Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline. + +To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files. ```{r nextflow-run, echo=FALSE, output="asis"} id <- "bash" @@ -177,33 +173,58 @@ Instead of a YAML, you can also pass a JSON or a CSV to the `--param_list` parameter. ::: + ## Module as part of a pipeline This module can also be used as part of a Nextflow pipeline. Below is a short preview of what this looks like. ```groovy -import { example_bash } from "target/main.nf" - -Channel.fromList([ - ["sample1", file("sample1.txt")], - ["sample2", file("sample2.txt")], - ["sample3", file("sample3.txt")] -]) - | view { it -> "input: $it" } - | example_bash - | view { it -> "output: $it" } +include { mymodule1 } from 'target/nextflow/mymodule1/main.nf' +include { mymodule2 } from 'target/nextflow/mymodule2/main.nf' + +workflow { + Channel.fromList([ + [ + // a unique identifier for this tuple + "myid", + // the state for this tuple + [ + input: file("in.txt"), + module1_k: 10, + module2_k: 4 + ] + ] + ]) + | mymodule1.run( + // use a hashmap to define which part of the state is used to run mymodule1 + fromState: [ + input: "input", + k: "module1_k" + ], + // use a hashmap to define how the output of mymodule1 is stored back into the state + toState: [ + module1_output: "output" + ] + ) + | mymodule2.run( + // use a closure to define which data is used to run mymodule2 + fromState: { id, state -> + [ + input: state.module1_output, + k: state.module2_k + ] + }, + // use a closure to return only the output of module2 as a new state + toState: { id, output, state -> + output + }, + auto: [ + publish: true + ] + ) +} ``` We will discuss building pipelines with VDSL3 modules in more detail in [Create a pipeline](create-a-pipeline.qmd). -## Improvements over standard Nextflow modules - -* No need to write any Nextflow Groovy code, just your script and the Viash config. -* VDSL3 module are also standalone pipelines. -* Help documentation is automatically generated. -* Standardized interface for passing parameter lists. -* Automatically uses the Docker platform's container. - - -{{< include ../../_includes/_prune_all_images.qmd >}} \ No newline at end of file diff --git a/guide/nextflow_vdsl3/index.qmd b/guide/nextflow_vdsl3/index.qmd index 3ccaf1d7..24e19bc2 100644 --- a/guide/nextflow_vdsl3/index.qmd +++ b/guide/nextflow_vdsl3/index.qmd @@ -2,4 +2,24 @@ title: Nextflow VDSL3 order: 30 hidden: true ---- \ No newline at end of file +--- + +Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development. + +Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing. + +You can use Viash to speed up or replace your pipeline development processes in the following steps: + +* [Use Viash to generate VDSL3 modules](create-and-use-a-module.qmd#build-the-vdsl3-module) +* [Run a module as a standaline pipeline](create-and-use-a-module.qmd#running-a-module-as-a-standalone-pipeline) +* [Import a VDSL3 module](create-and-use-a-module.qmd#module-as-part-of-a-pipeline) +* [Create a Nextflow workflow](create-a-pipeline.qmd) using one or more modules + + +## Improvements of VDSL3 modules over standard Nextflow modules + +* No need to write any Nextflow Groovy code, just your script and the Viash config. +* VDSL3 module are also standalone pipelines. +* Help documentation is automatically generated. +* Standardized interface for passing parameter lists. +* Automatically uses the Docker platform's container. diff --git a/guide/nextflow_vdsl3/introduction.qmd b/guide/nextflow_vdsl3/introduction.qmd deleted file mode 100644 index 547b14f2..00000000 --- a/guide/nextflow_vdsl3/introduction.qmd +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: Introduction -description: What is VDSL3? -order: 10 ---- - -Nextflow is a highly popular and widely-used workflow manager in computational biology, featuring outstanding portability, reproducibility and scalability. However, while Nextflow's advantages are impressive, developing a Nextflow pipeline can be challenging, requiring significant domain knowledge and verbose code that is labour-intensive. Fortunately, Viash provides a solution to the barriers of Nextflow pipeline development. - -Viash can help developers wrap their code into a state-of-the-art Nextflow script called a VDSL3 module. As we will demonstrate in the remainder of this guide, VDSL3 is effectively a separate DSL layer on top of Nextflow enabled by Viash, hence it is called Viash + Nextflow DSL 3, or VDSL3 for short. VDSL3's benefits extend beyond Nextflow pipeline development, including reusability, test-driven development, separation of concerns, and continuous testing. - -In the following sections, we'll show how to use build Nextflow modules from Viash components and how to put them together in a pipeline. \ No newline at end of file diff --git a/reference/nextflow_vdsl3/import_module.qmd b/reference/nextflow_vdsl3/import_module.qmd new file mode 100644 index 00000000..1f977b55 --- /dev/null +++ b/reference/nextflow_vdsl3/import_module.qmd @@ -0,0 +1,107 @@ +--- +title: Import a VDSL3 module +--- + +A VDSL3 module is a Nextflow module generated by Viash. See the [guide](/guide/nextflow_vdsl3/introduction.qmd) for a more in-depth explanation on how to create Nextflow workflows with VDSL3 modules. + +## Importing a VDSL3 module + + +After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module. + +**Example:** + +```groovy +include { mymodule } from 'target/nextflow/mymodule/main.nf' +``` + +## VDSL3 module interface + +VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an 'id' and a 'state': `[id, state]`, where `id` is a unique String and `state` is a `Map[String, Object]`. The resulting channel then consists of tuples `[id, new_state]`. + +**Example:** + +```groovy +workflow { + Channel.fromList([ + ["myid", [input: file("in.txt")]] + ]) + | mymodule +} +``` + +:::{.callout-note} +If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple. +That is, an input tuple `[id, input, ...]` will result in a tuple `[id, output, ...]` after running the module. +For example, an input tuple `["foo", [input: file("in.txt")], "bar"]` will result in an output tuple `["foo", [output: file("out.txt")], "bar"]`. +::: + +## Customizing VDSL3 modules on the fly + +Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky. + +The `un()` function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the `publishDir` directive to `"output/"` so the output of that step in the pipeline will be stored as output. + +**Example:** + +```groovy +workflow { + Channel.fromList([ + ["myid", [input: file("in.txt")]] + ]) + | mymodule.run( + args: [k: 10], + directives: [cpus: 4, memory: "16 GB"] + ) +} +``` + +### Arguments of `.run()` + +- `key` (`String`): A unique key used to trace the process and help make names of output files unique. Default: the name of the Viash component. + +- `args` (`Map[String, Object]`): Argument overrides to be passed to the module. + +- `directives` (`Map[String, Object]`): Custom directives overrides. See the Nextflow documentation for a list of available directives. + +- `auto` (`Map[String, Boolean]`): Whether to apply certain automated processing steps. Default values are inherited from the [Viash config](/reference/config/platforms/nextflow/auto.qmd). + +- `auto.simplifyInput`: If `true`, if the input tuple is a single file and if the module only has a single input file, the input file will be passed the module accordingly. Default: `true` (inherited from Viash config). + +- `auto.simplifyOutput`: If `true`, if the output tuple is a single file and if the module only has a single output file, the output map will be transformed into a single file. Default: `true` (inherited from Viash config). + +- `auto.publish`: If `true`, the output files will be published to the `params.publishDir` folder. Default: `false` (inherited from Viash config). + +- `auto.transcript`: If `true`, the module's transcript will be published to the `params.transcriptDir` folder. Default: `false` (inherited from Viash config). + +- `map` (`Function`): Apply a map over the incoming tuple. Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`. Default: `null`. + +- `mapId` (`Function`): Apply a map over the ID element of a tuple (i.e. the first element). Example: `{ id -> id + "_foo" }`. Default: `null`. + +- `mapData` (`Function`): Apply a map over the data element of a tuple (i.e. the second element). Example: `{ data -> [ input: data.output ] }`. Default: `null`. + +- `mapPassthrough` (`Function`): Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements). Example: `{ pt -> pt.drop(1) }`. Default: `null`. + +- `filter` (`Function`): Filter the channel. Example: `{ tup -> tup[0] == "foo" }`. Default: `null`. + +- `fromState`: Fetch data from the state and pass it to the module without altering the current state. `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. + + - If it is `null`, the state will be passed to the module as is. + - If it is a `List[String]`, the data will be the values of the state at the given keys. + - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map. + - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data. + + Example: `{ id, state -> [input: state.fastq_file] }` + Default: `null` + +- `toState`: Determine how the state should be updated after the module has been run. `toState` should be `null`, `List[String]`, `Map[String, String]` or a function. + + - If it is `null`, the state will be replaced with the output of the module. + - If it is a `List[String]`, the state will be updated with the values of the data at the given keys. + - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map. + - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state. + + Example: `{ id, output, state -> state + [counts: state.output] }` + Default: `{ id, output, state -> output }` + +- `debug`: Whether or not to print debug messages. Default: `false`. diff --git a/reference/nextflow_vdsl3/index.qmd b/reference/nextflow_vdsl3/index.qmd new file mode 100644 index 00000000..c33c9a06 --- /dev/null +++ b/reference/nextflow_vdsl3/index.qmd @@ -0,0 +1,10 @@ +--- +title: Nextflow VDSL3 +order: 35 +--- + +Viash supports creating Nextflow workflows in multiple ways. + +* [Run a module as a standaline pipeline](run_module.qmd) +* [Import a VDSL3 module](import_module.qmd) +* Create a Nextflow workflow with dependencies \ No newline at end of file diff --git a/reference/nextflow_vdsl3/run_module.qmd b/reference/nextflow_vdsl3/run_module.qmd new file mode 100644 index 00000000..648dd2b9 --- /dev/null +++ b/reference/nextflow_vdsl3/run_module.qmd @@ -0,0 +1,63 @@ +--- +title: Run a VDSL3 module +--- + + +Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline. + +To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a `--publish_dir` parameter, as Nextflow will automatically choose the parameter names of the output files. + +## Viewing the help message + +More information regarding a modules arguments can be shown by passing the `--help` +parameter. + +**Example:** + +```bash +nextflow run target/nextflow/mycomponent/main.nf --help +``` + + +## Running a module as a standalone pipeline +You can run the executable by providing a value for each of the required arguments and `--publish_dir` (where output files are published). + +**Example:** + +```bash +nextflow run target/nextflow/mycomponent/main.nf \ + --input config.vsh.yaml \ + --publish_dir output/ +``` + + +## Passing a parameter list + +Every VDSL3 can accept a list of parameters to populate a Nextflow channel with. Assuming we want to process a set of input files in parallel, we can create a yaml file `params.yaml` containing the following information. + + +```yaml +param_list: + - id: sample1 + input: data/sample1.txt + - id: sample2 + input: data/sample2.txt + - id: sample3 + input: data/sample3.txt + - id: sample4 + input: data/sample4.txt +arg1: 10 +arg2: 5 +``` + +You can run the pipeline on the list of parameters using the `-params-file` +parameter. + +```{bash} +nextflow run target/main.nf -params-file params.yaml --publish_dir output2 +``` + + +:::{.callout-tip} +You can also pass a YAML, CSV or JSON file to the `param_list` parameter. +::: \ No newline at end of file