Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow.basedir() semi-incompatible with new Snakemake provenance behaviour #180

Closed
pvandyken opened this issue Jul 13, 2022 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@pvandyken
Copy link
Contributor

The problem

In the latest versions of Snakemake (7.3 I believe), the shell code, inputs, parameters, and environment are tracked by Snakemake, and if they ever change, a rerun of the affected rule will be triggered.

In Snakemake workflows, the convention has been to prefix paths relative to the snakemake root dir (e.g. resources) with workflow.basedir or similar to accommodate workflow mode, wherein the cwd is moved to the users output dir and those relative files would become inaccessible. This creates a problem when the snakemake root dir is moved: the "top part" of the paths would change, triggering a rerun. This situation could occur, for instance, when a user runs a snakebids bidsapp in installed in a temporary directory like compute local scratch. In each compute session, the bidsapp would be in a different directory.

There are two current workarounds:

  1. Clear metadata (using snakemake --clear-metadata) every time the app is reinstalled. This requirement is unintuitive and burdensome in situations of frequent reinstall (like running on the cluster).
  2. Require or suggest disabling all rerun triggers except mtime. In other words, we don't support this feature. This seems unideal, as the feature has very useful applications.

A better solution or workaround is, I think, needed. My idea of shadow dirs #179 would solve the issue, but I leave this thread for any other suggestions.

@pvandyken pvandyken added the bug Something isn't working label Jul 13, 2022
@tkkuehn
Copy link
Contributor

tkkuehn commented Nov 11, 2022

  1. Try workflow.source_path
  2. If it doesn't work after moving a workflow, see if that could be worth raising with the Snakemake developers.

@tkkuehn
Copy link
Contributor

tkkuehn commented Nov 18, 2022

In today's meeting we tried workflow.source_path and found that it suffers from the same issue. So: @pvandyken is going to take a look at writing a PR on the snakemake repository to resolve this issue. If that doesn't work out for whatever reason, we can implement a replacement in Snakebids (or in its own small project). A signature like (resolve_workflow(workflow.basedir).resources), that would cache the resources and return a consistent path. Note: This cache should probably live in the output_dir to make sure that it lives past the lifetime of a singularity container.

@pvandyken
Copy link
Contributor Author

Was thinking about this a bit more, I think the shadow directory idea would be a nightmare to implement and cause more problems than it would solve.

There's really just two folders that generally might need to be referenced within a snakemake workflow that might case provenance problems: scripts and resources. Since those are the only folders causing problems, it seems easiest by far to just copy those folders directly into the output dir. Symlinks are also a possibility, although this may cause problems in more distributed setups. The folders can be deleted at the conclusion of the workflow.

For provenance handling, all resources should be marked as ancient, with file name changes used to indicate updates. Scripts are more complicated: I think snakemake considers both their contents and the file timestamp, and I don't know if they can be marked as ancient, I'd have to check.

@pvandyken
Copy link
Contributor Author

Superseded by #304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants