Compressor Optimizer #367

Tabaie · 2024-11-30T01:13:15Z

This PR implements issue #366.
Instead of running periodically, the optimizer attempts to work after every modifying method, thus there are no changes to the compressor's external interface.

The difference, from the user's point of view, is output nondeterminism, as it will differ based on how much time the compressor has had to optimize. Correctness however is still guaranteed.

Checklist

I wrote new tests for my new core changes.
I have successfully run tests, style checker and build against my new changes locally.
I have informed the team of any breaking changes if there are any.

codecov-commenter · 2024-11-30T01:20:13Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.20%. Comparing base (4e93fa3) to head (8f370b5).

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #367      +/-   ##
============================================
+ Coverage     70.17%   70.20%   +0.02%     
- Complexity     1070     1072       +2     
============================================
  Files           306      306              
  Lines         12337    12322      -15     
  Branches       1179     1178       -1     
============================================
- Hits           8658     8651       -7     
+ Misses         3200     3179      -21     
- Partials        479      492      +13

Flag	Coverage Δ		*Carryforward flag
hardhat	`98.70% <ø> (ø)`
kotlin	`67.89% <ø> (+0.02%)`	⬆️	Carriedforward from a0f261e

*This pull request uses carry forward flags. Click here to find out more.

see 9 files with indirect coverage changes

gbotrel · 2024-12-03T21:22:57Z

So to sum up, this is a non-breaking change, we could compress better (by ~7%) if we re-compress the full blob each time we attempt to append a block; but doing so is too slow so you propose to keep the original method, that would give a "preliminary result", and change the result behind the scenes between calls?

Wouldn't this have some side effects @jpnovais ? (i.e basically the compressor could say "we compressed block 1, it takes N Bytes, current blob is now at 100kB", then recompress it with more context and update the internal state to "current blob is 98kB" without notifying the coordinator. ).

Re implementation, before introducing an async optimizer, I'ld prefer to understand perf constraints better; i.e. how long it takes now, how long it would take if we recompress all the blob at each append, and within what limit we need operate . i.e. if we say the compressor could take as much as XXXms , then we may just want to have a simpler cleaner code and kill this async pattern.

jpnovais · 2024-12-03T21:46:19Z

My input on this optimization is:

Context:

The coordinator does not actively keep track of the current blob size, that is the responsibility of the compressor, which is aware of blob limit.
Coordonator calls fun CanWrite(data: ByteArray, data_len: Int): Boolean before calling Write, if returns false then coordinator knows the blob is full and starts a new one

My take based on the above:

As mentioned, Asyc can be problematic so I would avoid it. Also, I think we don't need it here.
My suggestion: when coordinator calls CanWrite before returning false, the compressor would try to perform the full compression an see if that extra blob fits.
- pros:
  - lazy computation - it only does the "full re-compression" when it reaches the limit;
  - 100% transparent to the coordinator and keeps same API. This leaves room for further internal code performance optimizations ;

@Tabaie @gbotrel WDYT?

gbotrel · 2024-12-03T21:50:16Z

Yep that would make less CPU use to do it only when full . But this last call to "CanWrite" may be 10x slower than the previous calls.
So what are the bounds perf-wise we need to operate in? If a call to CanWrite takes 500ms is that acceptable? (not saying it will, just want a rough order of magnitude of bound)

jpnovais · 2024-12-04T12:57:33Z

So what are the bounds perf-wise we need to operate in? If a call to CanWrite takes 500ms is that acceptable? (not saying it will, just want a rough order of magnitude of bound)

It's ok to have a call to CanWrite that takes 500ms at the of the blob, as long as the preceding calls are not affected timewise.

Tabaie · 2024-12-06T19:02:11Z

I agree, doing it synchronously at the end is a good idea. In fact it's similar to the "no compress" logic we already have.

gbotrel

LGTM 👍 I would probably just add one or two test to ensure this is correctly triggered and that the internal state is correctly reset after

Tabaie added 3 commits November 29, 2024 17:48

feat clock based optimizer

b2a5010

feat synchronously-called optimizer

dc76a1e

perf: insistent optimizer

a5292bb

Tabaie added enhancement New feature or request performances Label the current work as being directed toward performance optimization Data compressor labels Nov 30, 2024

Tabaie requested review from jpnovais, julien-marchand and gbotrel November 30, 2024 01:13

Tabaie self-assigned this Nov 30, 2024

Tabaie linked an issue Nov 30, 2024 that may be closed by this pull request

"Wholesale" blob compression #366

Open

4 tasks

Tabaie had a problem deploying to docker-build-and-e2e November 30, 2024 01:15 — with GitHub Actions Error

fix private header

85ca792

Tabaie temporarily deployed to docker-build-and-e2e November 30, 2024 01:25 — with GitHub Actions Inactive

Merge branch 'main' into compressor/perf/optimizer

88c2c50

Tabaie had a problem deploying to docker-build-and-e2e December 2, 2024 04:03 — with GitHub Actions Error

Merge branch 'main' into compressor/perf/optimizer

e28ee63

Tabaie had a problem deploying to docker-build-and-e2e December 3, 2024 17:12 — with GitHub Actions Error

Merge branch 'main' into compressor/perf/optimizer

aec0e51

Tabaie had a problem deploying to docker-build-and-e2e December 3, 2024 18:55 — with GitHub Actions Error

Merge branch 'main' into compressor/perf/optimizer

a0f261e

Tabaie had a problem deploying to docker-build-and-e2e December 6, 2024 18:54 — with GitHub Actions Error

refactor: synchronous optimization

8f370b5

Tabaie temporarily deployed to docker-build-and-e2e December 6, 2024 21:23 — with GitHub Actions Inactive

gbotrel approved these changes Dec 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressor Optimizer #367

Compressor Optimizer #367

Tabaie commented Nov 30, 2024

codecov-commenter commented Nov 30, 2024 •

edited

Loading

gbotrel commented Dec 3, 2024

jpnovais commented Dec 3, 2024

gbotrel commented Dec 3, 2024 •

edited

Loading

jpnovais commented Dec 4, 2024

Tabaie commented Dec 6, 2024

gbotrel left a comment

Compressor Optimizer #367

Are you sure you want to change the base?

Compressor Optimizer #367

Conversation

Tabaie commented Nov 30, 2024

Checklist

codecov-commenter commented Nov 30, 2024 • edited Loading

Codecov Report

gbotrel commented Dec 3, 2024

jpnovais commented Dec 3, 2024

gbotrel commented Dec 3, 2024 • edited Loading

jpnovais commented Dec 4, 2024

Tabaie commented Dec 6, 2024

gbotrel left a comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 30, 2024 •

edited

Loading

gbotrel commented Dec 3, 2024 •

edited

Loading