Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization thoughts #86

Open
ctb opened this issue Oct 29, 2024 · 2 comments
Open

optimization thoughts #86

ctb opened this issue Oct 29, 2024 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Oct 29, 2024

A miscellaneous collection of thoughts that are not big enough or well enough formed to put in their own issues, but should be noted somewhere.

We may be double-hashing things - hashing the hash vals. Can we use murmurhash as the hashing approach?

What kind of perf benchmarking can we do? codspeed looked pretty nice when used over in sourmash.

We should use String less and &str more. References over clones, etc.

Similarly, there are a few places where we are using vec unnecessarily, I think.

@ctb
Copy link
Contributor Author

ctb commented Oct 29, 2024

@Adamtaranto
Copy link
Collaborator

Low hanging efficiency fruit:

  • Revcomp the full seq and step through backwards in sliding window to find canonical kmers instead of revcomp per kmer.
  • Make consume multithreaded at the kmer level - both KmersAndHashesIter and SeqToHashes are iterators so this should be fairly simple, unless we are worried about diff threads trying to write to same hash?
  • Make multithreaded at the sequence level?
  • Consume directly from file to avoid having to load seqs in Python and pass back to rust per WIP: add consume_file #10

Originally posted by @Adamtaranto in #70 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants