Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove per-instance allocator and empty allocations from chainbase containers #1652

Closed
wanderingbort opened this issue Sep 18, 2023 · 7 comments · Fixed by #1851
Closed

Remove per-instance allocator and empty allocations from chainbase containers #1652

wanderingbort opened this issue Sep 18, 2023 · 7 comments · Fixed by #1851
Assignees

Comments

@wanderingbort
Copy link
Contributor

There are a few improvements to chainbase that can reduce wasted space in the shared memory file that exist in the chainbase-container-allocator-poc branch

Per-instance allocator storage

each shared_vector and shared_cow_string stores its own copy of the chainbase allocator. This is always the same pointer even when it has been cast into a different sized allocator (eg node vs char allocator). These types need access to the allocator for their mutating actions as well as destruction. As it is always the same so the POC implements storage as a global static member of the container class with assertions that the assumption its the same holds. Some method of removing all the duplicated allocators suffices. In order to achieve this for shared_vector a new shared_cow_vector was created. this has the additional benefit of creating fewer spurious allocations in the shared memory file on mutations which may lead to a less fragmented file.

**note: removing allocators from shared_cow_string exposed an implementation bug where sizeof(shared_cow_string) was used in place of sizeof(shared_cow_string::impl) in the deallocation. Somehow this was not creating issues?

empty shared_cow_string allocations

When a shared_cow_string is initialized at runtime (eg, when a new account is created and its ABI is not set) it does not allocate a shared_cow_string::impl structure. The nullptr held in the shared_cow_string is equivalent to an empty string. However, when a shared_cow_string is read from a Datastream or variant, the behavior is different. In that case resize_and_fill or an equivalent is used and this always allocates even if the size passed in is 0. This results in extra bloat when loading from a snapshot that is not present when the state is derived from replay. The POC prevents allocation for empty structures.

Results

When testing the POC at block 310859547 from snapshot these changes reduced the initial state size from 23,263,420,992 bytes to 21,785,865,104 bytes (or around 6% savings)

@wanderingbort
Copy link
Contributor Author

@spoonincode has correctly pointed out in chats that the PoC's use of static allocator members is not compatible with our unit tests where more than one controller and therefore chainbase can exist in the same running process.

He has suggested a well-known memory location scheme such that given this for an object you can derive the memory location of the allocator.

@wanderingbort
Copy link
Contributor Author

Another issue to address in the PoC is that not a lot of care is taken in shared_cow_vector<T> to make sure contained objects are properly constructed.

Nor are the allocations typed meaning they may run afoul of alignment and padding rules. It is trivial to remap the allocators toa type so that they properly respect alignment etc

@greg7mdp
Copy link
Contributor

greg7mdp commented Oct 1, 2023

note: removing allocators from shared_cow_string exposed an implementation bug where sizeof(shared_cow_string) was used in place of sizeof(shared_cow_string::impl) in the deallocation. Somehow this was not creating issues?

I just saw that the deallocate function of bip::allocator<...> does not use the passed size:

   void deallocate(const pointer &ptr, size_type)
   {  mp_mngr->deallocate((void*)ipcdetail::to_raw_pointer(ptr));  }

@greg7mdp
Copy link
Contributor

greg7mdp commented Oct 1, 2023

He has suggested a well-known memory location scheme such that given this for an object you can derive the memory location of the allocator.

Hey @spoonincode, what scheme did you have in mind exactly? I imagine we can track the segment_manager address and size for each chainbase instance, maybe in a static boost::flat_set, and from that figure out the chainbase instance the string/vector was allocated in. For a single instance (which is the case in Leap) this should be efficient.

Is this what you had in mind?

@spoonincode
Copy link
Member

Since the max size is 8TB, always map the chainbase at a 8TB boundary (there are a handful of ways of doing this: you can just probe around until one works, or map 8TB+db_config_size PROT_NONE and then MAP_FIXED in to that. The later has the benefit of leaning on existing libc & kernel logic to find a spot & ASLR and such (not that there is much to randomize..)

We always construct the BIP::segment_manager at a known offset in the mapping,
https://github.com/AntelopeIO/chainbase/blob/main/src/pinnable_mapped_file.cpp#L106
and the segment_manager has a get_allocator(),
https://www.boost.org/doc/libs/1_83_0/doc/html/boost/interprocess/segment_manager.html#idm28621-bb

So that means given some pointer inside of the mapping, the allocator can be found via something like
((segment_manager*)((ptr & ~(0x80000000000-1)) + header_size))->get_allocator()

@greg7mdp
Copy link
Contributor

greg7mdp commented Oct 1, 2023

@spoonincode that's great, I should have though of that!

@greg7mdp
Copy link
Contributor

greg7mdp commented Oct 2, 2023

@spoonincode unfortunately I cannot use PROT_NONE or MAP_FIXED with the boost interprocess mmap API, which is what allows us to support non-linux platforms.

I feel that the best way forward is to use a static flat_set (my first idea). It is slightly slower than the pointer masking trick, but it still should be very fast (esp when we have a single chainbase instance), and since it is used only when performing allocations or deallocations, I think the overall impact will be negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
5 participants