Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiQC not resuming #233

Open
apeltzer opened this issue Aug 8, 2024 · 11 comments
Open

MultiQC not resuming #233

apeltzer opened this issue Aug 8, 2024 · 11 comments
Assignees
Milestone

Comments

@apeltzer
Copy link
Member

apeltzer commented Aug 8, 2024

No description provided.

@apeltzer apeltzer converted this from a draft issue Aug 8, 2024
@apeltzer apeltzer added help wanted Extra attention is needed question Further information is requested labels Aug 8, 2024
@apeltzer
Copy link
Member Author

apeltzer commented Aug 8, 2024

unclear if this is intended or not --> verify

@nschcolnicov nschcolnicov self-assigned this Aug 9, 2024
@apeltzer
Copy link
Member Author

Ask @fmalmeida what he had to do to make this work :)

@edmundmiller
Copy link
Collaborator

I thought there was a "don't cache" setting somewhere, and it was intended, but there's not. It happens on every nf-core pipeline...

@ewels Any thoughts on where this is coming from?

Might be better to move this to tools.

@apeltzer
Copy link
Member Author

Thought the same initially, but its not been set here. Not a major problem here anyways (and negligible runtime too, considering how much $$$ go into demuxing an entire flowcell ;-)).

@grst
Copy link
Member

grst commented Aug 12, 2024

I woudn't say the runtime is negligible... on a recent large flow cell, multiqc ran for ~1h (not sure how much time was wasted on staging-in files though).

I also never got why one would intentionally not resume multiqc...

@fmalmeida
Copy link

Hey hey hey,
The main thing that makes the MultiQC module do not cache is the cache = false that sometimes is added as @edmundmiller mentioned, but mainly the fact that many run-specific variable metadata is added to the MultiQC Summary Map wich makes this input-map of metadata always different for every run, and thus, never caching, see here:

https://github.com/nf-core/demultiplex/blob/master/lib/NfcoreTemplate.groovy#L72-L95

@apeltzer
Copy link
Member Author

This means that its not so easy to adapt this without changing the workflow_summary_mqc.yaml and methods_description_mqc.yaml by changing whats ingested into these two YAML files as there are some variables that contain timestamps and thus are updated on any resume. To be more explicit lets close this ticket, enable caching = false in the conf/modules.config for multiqc (so that users get what they think they will get) and leave it as is. If we at some point decide to take this on, I would suggest we can still do this in a next / patch release. Thanks for your points @fmalmeida :)

@apeltzer apeltzer removed help wanted Extra attention is needed question Further information is requested labels Aug 12, 2024
@nschcolnicov
Copy link
Contributor

I assessed this in the current dev branch (commit id: 892b9d8).
The main conflicting channel is ch_multiqc_files, which contains two files that are different with each execution: workfow_summary_mqc.yaml and methods_description_mqc.yaml.

These files are modified with each execution because they contain some data like timestamp of execution, runName, among others.
In order to have multiqc resume we would need to:

  1. Change the collect operator for the ch_multiqc_files and add "sort: true".
  2. Update the content of the workflow_summary_mqc.yaml file to remove runName, or develop a rule so that it uses the same runName as the previous execution if every other process was ran from cache.
  3. Update the methods_description_mqc.yaml file so that it doesn't contain runName, timestamp, and any other value that changes with execution, or use a similar rule as for workflow_summary_mqc.yaml.

@grst
Copy link
Member

grst commented Aug 12, 2024

Thanks for the analysis... If this is to be changed, then it should happen at the pipeline template level in nf-core/tools.

@nschcolnicov
Copy link
Contributor

Added it:
#239

@apeltzer
Copy link
Member Author

apeltzer commented Aug 12, 2024

I will file an issue there and we can take it up once this has been agreed upon in the wider community - will x-ref this ticket here so we can take it up once there was a decision in the community... :) See this one: nf-core/tools#3110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants