Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: write sigs to tmpdir first to improve restart capacity #97

Closed
wants to merge 5 commits into from

Conversation

bluegenes
Copy link
Collaborator

@bluegenes bluegenes commented Sep 12, 2024

To allow restart, write sigs to a tmpdir instead of directly to a zipfile. Then copy all sigs to the zipfile at the very end.

If we get interrupted, first read all the existing sigs that we can, build a manifest, and use this to intersect with sig templates to avoid rebuilding. I think the way to do this is to store a Hashmap of filename: Records for faster lookup+intersection as we loop through files.

To do:

  • figure out how to properly do this intersection

thoughts:

  • do we want to write mini manifests as we go, so we can just read those in upon restart? Currently set to read all files in the tmpdir.
  • we currently have to build a new record for each signature b/c the filename + location needs to change. Is there a way to minimize this overhead / modify the existing record?

@bluegenes
Copy link
Collaborator Author

Intersecting existing sig Record info with sig templates does not seem straightforward atm, see sourmash-bio/sourmash#3322

@bluegenes
Copy link
Collaborator Author

chat with titus:

  • another idea: write *sig.zip files, batched e.g. 1000 sigs per zipfile (make tunable)

@bluegenes
Copy link
Collaborator Author

closing in favor of #101 and #102

@bluegenes bluegenes closed this Sep 30, 2024
@bluegenes bluegenes deleted the improve-restart branch October 24, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant