Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support including existing files within watchPath? #5353

Open
siddharthab opened this issue Oct 2, 2024 · 2 comments
Open

Support including existing files within watchPath? #5353

siddharthab opened this issue Oct 2, 2024 · 2 comments

Comments

@siddharthab
Copy link
Contributor

New feature

It does not seem like there is a perfect way of picking up existing files and any new files from a root path using a combination of the two channel factories - watchPath and fromPath. Calling them one after the other presents synchronization issues between the two calls. If watchPath is called first, then you may get duplicates, and if it is called second, then you may drop some files created between the calls.

Usage scenario

This is useful when a separate pipeline is generating data that is supposed to be consumed by a Nextflow workflow. That separate pipeline could have been started some time ago, having already generated some files, so the Nextflow workflow needs to catch up and then stay current.

A workflow from epi2me-labs uses watchPath in a way that can drop files.

Suggest implementation

Seems to me that adding an option to watchPath would be the way to go? Unless there is an easy workaround already.

Happy to work on this if people want.

@bentsherman
Copy link
Member

Channel.fromPath is essentially a channel wrapper over the files() function. It might be easier to use that instead to drop duplicates

@siddharthab
Copy link
Contributor Author

@bentsherman I don't understand your comment. How may I be able to use the files() function to achieve synchronization across the call to itself and the setting up of the watchPath channel.

My current code looks something like this (assuming that asSychronized() is working as I expect it to):

workflow {
    String pattern = 'input/*'
    String stop_filename = 'STOP'
    Set<String> input_names = ([] as Set<String>).asSynchronized()

    new_inputs = Channel
        .watchPath(pattern)
        .until { it.name == stop_filename }
        .filter { input_names.add(it.name) }
    new_inputs.view { "New ${it.name}" }

    existing_inputs = Channel
        .fromPath(pattern)
        .filter { input_names.add(it.name) }
    existing_inputs.view { "Existing ${it.name}" }

    inputs = existing_inputs.mix(new_inputs)
    inputs.view()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants