Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up search operations for SeqOrView #325

Merged
merged 4 commits into from
Oct 25, 2024

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Oct 25, 2024

This commit adds new methods for findnext and findprev for SeqOrView with known alphabets, which use bitparallel operations. This in turns speeds up most search ops which are defined in terms of these.
The new code is 4-20 times faster depending on circumstances.

It's only implemented for known alphabets because new alphabets may overload == in surprising ways, which makes the bitparallel ops invalid.

The commit also introduces a new internal abstraction, the parts function, which may be useful for other operations down the line. It's similar to the existing chunk iterators, but may be more efficient for subsequences, and can be reversed.

There is also some minor cleanup that could have been its own PR, but whatever.

TODO

  • Tests

This commit adds new methods for findnext and findprev for SeqOrView with known
alphabets, which use bitparallel operations. This in turns speeds up most search
ops which are defined in terms of these.
The new code is 4-20 times faster depending on circumstances.

It's only implemented for known alphabets because new alphabets may overload ==
in surprising ways, which makes the bitparallel ops invalid.

The commit also introduces a new internal abstraction, the `parts` function,
which may be useful for other operations down the line. It's similar to the
existing chunk iterators, but may be more efficient for subsequences, and can
be reversed.

There is also some minor cleanup that could have been its own PR, but whatever.
Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 81.20805% with 28 lines in your changes missing coverage. Please review.

Project coverage is 90.85%. Comparing base (95d9218) to head (51c5553).
Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
src/longsequences/chunk_iterator.jl 62.50% 15 Missing ⚠️
src/biosequence/find.jl 8.33% 11 Missing ⚠️
src/longsequences/operators.jl 97.75% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #325      +/-   ##
==========================================
- Coverage   90.87%   90.85%   -0.02%     
==========================================
  Files          31       28       -3     
  Lines        2400     2636     +236     
==========================================
+ Hits         2181     2395     +214     
- Misses        219      241      +22     
Flag Coverage Δ
unittests 90.85% <81.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jakobnissen jakobnissen marked this pull request as ready for review October 25, 2024 19:05
@jakobnissen
Copy link
Member Author

Codecov is complaining about coverage mostly because parts for LongSequence is not used yet. I have an upcoming PR up that will use it.

@jakobnissen jakobnissen merged commit 1473594 into BioJulia:master Oct 25, 2024
21 of 22 checks passed
@jakobnissen jakobnissen deleted the find branch October 25, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant