Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static types for process inputs/outputs #4553

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
e974900
Refactor ast xform classes
bentsherman Nov 19, 2023
4612de3
Move process and workflow DSLs into separate classes
bentsherman Nov 20, 2023
5ad9813
Add ProcessFn annotation
bentsherman Nov 30, 2023
01ef1db
Rename ProcessDsl -> ProcessBuilder, add separate builder for process…
bentsherman Nov 30, 2023
c465000
Add WorkflowFn annotation
bentsherman Nov 30, 2023
8f2c090
Add support for native processes, use reflection to invoke workflows
bentsherman Dec 1, 2023
48fdfc2
Separate process input channel logic from task processor
bentsherman Dec 1, 2023
041e10a
Remove params from WorkflowFn
bentsherman Dec 1, 2023
a52a829
Simplify ProcessFn param names
bentsherman Dec 2, 2023
570892c
Separate `InParam`s from task config
bentsherman Dec 2, 2023
cf0e4b2
Fix process input channel logic
bentsherman Dec 2, 2023
0c490e8
Fix bugs
bentsherman Dec 6, 2023
8733ba6
Refactor process inputs and outputs
bentsherman Dec 8, 2023
d2268b2
Refactor process inputs/outputs DSL
bentsherman Dec 9, 2023
bca231b
Move ProcessBuilder#applyConfig() into subclass
bentsherman Dec 9, 2023
872a3e2
Add CombineManyOp to combine process input channels
bentsherman Dec 9, 2023
72b54f6
Save variable refs in ProcessFn
bentsherman Dec 9, 2023
dfd5aea
Fix bugs
bentsherman Dec 9, 2023
1e77a22
Fix task hash (resume still not working)
bentsherman Dec 10, 2023
c00ee3f
Update tests
bentsherman Dec 13, 2023
f7b3fa8
Move annotation API to separate branch
bentsherman Dec 13, 2023
c300f00
Minor edits
bentsherman Dec 13, 2023
ce2de32
Minor edits
bentsherman Dec 13, 2023
47a85be
Fix storeDir warning and task context caching
bentsherman Dec 17, 2023
cc2c08e
Merge upstream changes
bentsherman Dec 17, 2023
8b5fbb6
Merge branch 'master' into ben-programmatic-api
bentsherman Dec 17, 2023
48423e4
Fix failing integration tests
bentsherman Dec 18, 2023
36510f4
Fix failing integration tests, minor changes
bentsherman Dec 18, 2023
353493e
Update tests
bentsherman Dec 18, 2023
177120c
Add comments
bentsherman Dec 18, 2023
8441989
Fix stdout evaluation
bentsherman Dec 18, 2023
700ea34
Merge branch 'master' into ben-programmatic-api
bentsherman Mar 28, 2024
1f6705b
Move LazyHelper to script package, update copyright
bentsherman Mar 28, 2024
ecdaaa4
cleanup
bentsherman Mar 28, 2024
4509b28
Infer staging of file inputs from input types
bentsherman Mar 29, 2024
8efcfc0
Update docs
bentsherman Mar 29, 2024
8a3a827
Fix error with legacy syntax
bentsherman Mar 29, 2024
1cd6fce
Rename CombineManyOp -> MergeWithEachOp
bentsherman Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions docs/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -821,6 +821,96 @@ In general, multiple input channels should be used to process *combinations* of

See also: {ref}`channel-types`.

(process-typed-inputs)=

### Typed inputs

:::{versionadded} 24.10.0
:::

Typed inputs are an alternative way to define process inputs with standard types. This approach has a number of benefits:

- A typed input can validate the values that it receives at runtime and raise an error if there is a type mismatch.

- Whereas a `path` input relies on a custom `arity` option in order to distinguish between a single file and a list of files, with typed inputs it is trivial: `Path` vs `List<Path>`.

- Typed inputs enable the use of custom record types (i.e. defined using `@ValueObject` or `record`), which makes the code much easier to read and understand compared to `tuple` inputs.

A typed input is simply a variable declaration, i.e. `<type> <name>`. Here are some examples:

```groovy
input:
int my_int
String my_string
Path my_file
List<Path> my_files
```

In the above example:

- `my_int` and `my_string` are treated like `val` inputs; they are defined in the process body as variables

- `my_file` and `my_files` are treated like `path` inputs; they are defined as variables and their files are staged into the task directory

One of the most important capabilities enabled by typed inputs is the use of custom record types. Here is an example:

```groovy
@ValueObject
class Sample {
String id
List<Path> reads
}

process foo {
input:
Sample my_sample

// ...
}
```

In this example, `Sample` is a record type with two members `id` and `reads`. The `Sample` input in process `foo` will be provided as a variable to the process body, where its members can be accessed as `my_sample.id` and `my_sample.reads`. Additionally, because `my_sample.reads` is a collection of files (given by its type `List<Path>`), it will be staged into the task directory like a `path` input.

Environment variables and standard input can be defined using the new `env` and `stdin` directives. Building from the previous example:

```groovy
process foo {
env('SAMPLE_ID') { my_sample.id }
env('FIRST_READ_FILE') { my_sample.reads[0]?.name }
stdin { my_sample.reads[0] }

input:
Sample my_sample

// ...
}
```

In the above example:

- The sample id will be exported to the `SAMPLE_ID` variable in the task environment
- The name of the first sample read file will be exported to the `FIRST_READ_FILE` variable in the task environment
- The contents of the first sample read file will be provided as standard input to the task

By default, file inputs are automatically inferred from the types and staged into the task directory. Alternatively, the `stageAs` directive can be used to stage files under a different name, similar to using the `name` or `stageAs` option with a `path` input. For example:

```groovy
process foo {
stageAs('*.fastq') { my_sample.reads }

input:
Sample my_sample

// ...
}
```

In this case, `my_sample.reads` will be staged as `*.fastq`, overriding the default behavior.

:::{note}
While the `env`, `stageAs`, and `stdin` directives are provided as a convenience, it is usually easier to simply rely on the default file staging behavior, and to use the input variables directly in the task script.
:::

(process-output)=

## Outputs
Expand Down Expand Up @@ -1202,6 +1292,79 @@ The following options are available for all process outputs:

: Defines the {ref}`channel topic <channel-topic>` to which the output will be sent.

(process-typed-outputs)=

### Typed outputs

:::{versionadded} 24.10.0
:::

Typed outputs are an alternative way to define process outputs with standard types. This approach has a number of benefits:

- A typed output clearly describes the expected structure of the output, which makes it easier to use the output in downstream operations.

- Whereas a `path` output relies on a custom `arity` option in order to distinguish between a single file and a list of files, with typed outputs it is trivial: `Path` vs `List<Path>`.

- Typed outputs enable the use of custom record types (i.e. defined using `@ValueObject` or `record`), which makes the code much easier to read and understand compared to `tuple` outputs.

A typed output is simply a variable declaration with an optional assignment, i.e. `<type> <name> [= <value>]`. Here are some examples:

```groovy
output:
int my_int
String my_string = my_input
Path my_file = path('file1.txt')
List<Path> my_files = path('*.txt')
```

In the above example:

- `my_int` and `my_string` are treated like `val` outputs; they are assigned to the variables `my_int` and `my_input`, which are expected to be defined in the process body

- `my_file` and `my_files` are treated like `path` outputs; they are assigned to a file or list of files based on a matching pattern using the `path()` method

- The output variable names correspond to the `emit` option for process outputs

One of the most important capabilities enabled by typed outputs is the use of custom record types. Here is an example:

```groovy
@ValueObject
class Sample {
String id
List<Path> reads
}

process foo {
input:
String id

output:
Sample my_sample = new Sample(id, path('*.fastq'))

// ...
}
```

In this example, `Sample` is a record type with two members `id` and `reads`. The `Sample` output will be constructed from the `id` input variable and the collection of task output files matching the pattern `*.fastq`.

In addition to the `path()` method, there are also the `env()`, `eval()`, and `stdout()` methods for extracting environment variables, eval commands, and standard output from the task environment. For example:

```groovy
process foo {
// ...

output:
String my_env = env('MY_VAR')
String my_eval = eval('bash --version')
String my_stdout = stdout()
List my_tuple = [ env('MY_VAR'), eval('bash --version'), stdout() ]

// ...
}
```

As shown in the above examples, output values can be any expression, including lists, maps, records, and even function calls.

## When

The `when` block allows you to define a condition that must be satisfied in order to execute the process. The condition can be any expression that returns a boolean value.
Expand Down
4 changes: 3 additions & 1 deletion modules/nextflow/src/main/groovy/nextflow/Session.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ import nextflow.processor.ErrorStrategy
import nextflow.processor.TaskFault
import nextflow.processor.TaskHandler
import nextflow.processor.TaskProcessor
import nextflow.script.dsl.ProcessConfigBuilder
import nextflow.script.BaseScript
import nextflow.script.ProcessConfig
import nextflow.script.ProcessFactory
Expand Down Expand Up @@ -927,7 +928,7 @@ class Session implements ISession {
* @return {@code true} if the name specified belongs to the list of process names or {@code false} otherwise
*/
protected boolean checkValidProcessName(Collection<String> processNames, String selector, List<String> errorMessage) {
final matches = processNames.any { name -> ProcessConfig.matchesSelector(name, selector) }
final matches = processNames.any { name -> ProcessConfigBuilder.matchesSelector(name, selector) }
if( matches )
return true

Expand All @@ -938,6 +939,7 @@ class Session implements ISession {
errorMessage << message.toString()
return false
}

/**
* Register a shutdown hook to close services when the session terminates
* @param Closure
Expand Down
Loading
Loading