Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Do not copy purged outputs to object store #18342

Merged
merged 16 commits into from
Jun 11, 2024

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Jun 7, 2024

Fixes #18337. Needs a whole lot of tests still, this is just a rough pass at the most obvious cases where this might happen.

Tests to write:

  • Purge output before queue (workflow and delete action ?) * outputs_to_working_directory * extra_files * dynamic outputs * extended_metadata
  • Purge output while running * outputs_to_working_directory * extra_files * dynamic outputs * extended_metadata

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

mvdbeek added 4 commits June 7, 2024 15:20
This in particular needs a lot of new tests.
We will also need to actively purge datasets in the model store import
code, since users might have purged datasets while the job ran.
Again, more tests needed.
@mvdbeek mvdbeek force-pushed the fix_copying_purged_files branch from 6be29de to 8e6e25d Compare June 7, 2024 13:20
# Users can purge outputs before the job completes,
# in that case we don't want to copy the output to a purged path.
# Static, non work_dir_output files are handled in job_finish code.
return f'\nif [ -f "{source_file}" -a -f "{destination}" ] ; then cp "{source_file}" "{destination}" ; fi'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a wild check 😭. I hate it but I don't know the alternative. I know people love the disk object store and jobs writing directly to the object store 🤔.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of an "optimization" .. I'm happy to restore the original version, the important thing is that we don't store data in purged datasets.

mvdbeek added 2 commits June 8, 2024 16:27
These are only written if the command actually ran, so we'd fail here
for instance if the outputs are deleted before the job had a chance to
run.
@mvdbeek mvdbeek force-pushed the fix_copying_purged_files branch from 0467e96 to 94bd0c5 Compare June 10, 2024 08:27
mvdbeek added 2 commits June 10, 2024 10:34
Add a sleep so we can delay output purging until command line is
templated. If we don't sleep we generate a command line where the
output path is just `''`. That needs another fix!
@mvdbeek mvdbeek force-pushed the fix_copying_purged_files branch from 94bd0c5 to 03eca7c Compare June 10, 2024 08:34
mvdbeek added 7 commits June 10, 2024 11:53
This is a tricky one. There could be systematic issues here that an
admin would want to fix, but it could also just be the case that a user
forced a wrong datatype. Ideally we'd probably tag this as job execution
issue ? It's not very different from a job error IMO.
@mvdbeek mvdbeek marked this pull request as ready for review June 10, 2024 14:45
@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 10, 2024

I haven't tested extra_files handling yet, and I know that command line templating is broken if you purge the output dataset before the job is run and the command line is queued if you're not using outputs_to_working_directory.

But there's so many fixes in here and they have coverage, so let's get this in (if the tests pass) while I work on the remaining tests.

@mvdbeek mvdbeek force-pushed the fix_copying_purged_files branch from 25bca7c to f0e09cc Compare June 11, 2024 06:18
@mvdbeek mvdbeek merged commit 96c9be3 into galaxyproject:release_24.0 Jun 11, 2024
50 checks passed
@jdavcs jdavcs added this to the 24.1 milestone Jun 19, 2024
@nsoranzo nsoranzo deleted the fix_copying_purged_files branch August 6, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants