-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressor Optimizer #367
base: main
Are you sure you want to change the base?
Compressor Optimizer #367
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #367 +/- ##
============================================
+ Coverage 70.17% 70.20% +0.02%
- Complexity 1070 1072 +2
============================================
Files 306 306
Lines 12337 12322 -15
Branches 1179 1178 -1
============================================
- Hits 8658 8651 -7
+ Misses 3200 3179 -21
- Partials 479 492 +13
*This pull request uses carry forward flags. Click here to find out more. |
So to sum up, this is a non-breaking change, we could compress better (by ~7%) if we re-compress the full blob each time we attempt to append a block; but doing so is too slow so you propose to keep the original method, that would give a "preliminary result", and change the result behind the scenes between calls? Wouldn't this have some side effects @jpnovais ? (i.e basically the compressor could say "we compressed block 1, it takes N Bytes, current blob is now at 100kB", then recompress it with more context and update the internal state to "current blob is 98kB" without notifying the coordinator. ). Re implementation, before introducing an async optimizer, I'ld prefer to understand perf constraints better; i.e. how long it takes now, how long it would take if we recompress all the blob at each append, and within what limit we need operate . i.e. if we say the compressor could take as much as XXXms , then we may just want to have a simpler cleaner code and kill this async pattern. |
My input on this optimization is: Context:
My take based on the above:
|
Yep that would make less CPU use to do it only when full . But this last call to "CanWrite" may be 10x slower than the previous calls. |
It's ok to have a call to CanWrite that takes 500ms at the of the blob, as long as the preceding calls are not affected timewise. |
I agree, doing it synchronously at the end is a good idea. In fact it's similar to the "no compress" logic we already have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 I would probably just add one or two test to ensure this is correctly triggered and that the internal state is correctly reset after
This PR implements issue #366.
Instead of running periodically, the optimizer attempts to work after every modifying method, thus there are no changes to the compressor's external interface.
The difference, from the user's point of view, is output nondeterminism, as it will differ based on how much time the compressor has had to optimize. Correctness however is still guaranteed.
Checklist