Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new method in Codec to provide record's estimated size in memory #336

Open
artemananiev opened this issue Dec 20, 2024 · 0 comments
Labels
Performance Issues related to performance concerns.

Comments

@artemananiev
Copy link
Member

artemananiev commented Dec 20, 2024

com.hedera.pbj.runtime.Codec has a method to measure how many bytes a record would take, when serialized. This is measureRecord() This method is used in Virtual Mega Map prototype hashgraph/hedera-services#17007 to estimate virtual node cache size in memory, which is required to flush data to disk in proper times.

There are a couple issues with measureRecord(), though:

  • Performance. These methods are slow
  • The size includes protobuf tags, lengths, and so on
  • Var ints/longs may take less or more bytes when serialized than in memory
  • Default values are not taken into consideration, yet they take some bytes in memory

It all makes this method not very suitable to estimate virtual node cache size in memory. This ticket is to provide a new method in the Codec interface for this purpose. It doesn't have to be very precise, but it has to be fast. For example, it's really hard to understand how many bytes a String field uses in memory, but length() * 2 is a fast and very conservative estimation. Bytes is easy, they are 16 + byte array length. Boxed booleans/integers seem to be 16 bytes, boxed longs are 24 bytes, and so on. Some research is needed here to find good memory estimations for all field types. Focus should be on speed rather than precision.

@artemananiev artemananiev added the Performance Issues related to performance concerns. label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Issues related to performance concerns.
Projects
None yet
Development

No branches or pull requests

1 participant