Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mimalloc Memory Allocator & Extend Benchmarks with Different Memory Allocators #2122

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

AmmarAbouZor
Copy link
Member

@AmmarAbouZor AmmarAbouZor commented Oct 21, 2024

This PR closes #2117

This PR changes the used memory allocator in Rust to mimalloc memory allocator because it's provided performance gain about 9% in the producer loop.

  • The allocator is set in rs-bindings crates + in each existing benchmarks.
  • I've extended the benchmarks with mock structs to one benchmark for each well-known memory allocator (standard system memory allocator, jemalloc and mimalloc).
  • To achieve these extensions, I've created macros to generate the same benchmarks for the given memory allocator.
  • Benchmarks have been extended in the Build CLI Tool configuration as well.

Update:
jemalloc allocator support on Windows is still unstable, which makes it unrelevant for our use-case.

Currently we still need to test the changes on variety of environment to make sure that we are not getting regression on any of them.
Platforms to test:

  • Linux x86
  • Windows x86
  • MacOs x86
  • MacOs Arm64
  • Linux Arm64 ??
  • Windows Arm64 ?? (not available)

Benchmarks Results:

Linux x86:

# Mimalloc:

## App with DLT:
File Read Took: 10606
File Read Took: 10295
File Read Took: 10402

## Benchmarks single item in parse return:
time:   [4.0199 ms 4.0246 ms 4.0299 ms]
time:   [4.0604 ms 4.0651 ms 4.0707 ms]
time:   [4.0112 ms 4.0170 ms 4.0234 ms]

## Benchmarks multiple items in parse return:
time:   [5.8757 ms 5.8791 ms 5.8830 ms]
time:   [5.9564 ms 5.9604 ms 5.9653 ms]
time:   [5.9736 ms 5.9773 ms 5.9818 ms]
--------------------------------------------------

# System Allocator:

## App with DLT:
File Read Took: 11586
File Read Took: 11429
File Read Took: 11397

## Benchmarks single item in parse return:
time:   [4.5073 ms 4.5173 ms 4.5277 ms]
time:   [4.5555 ms 4.5648 ms 4.5744 ms]
time:   [4.4404 ms 4.4516 ms 4.4636 ms]

## Benchmarks multiple items in parse return:
time:   [8.5335 ms 8.5537 ms 8.5741 ms]
time:   [8.4373 ms 8.4573 ms 8.4777 ms]
time:   [8.4468 ms 8.4690 ms 8.4913 ms]
-------------------------------------------------

# Jemalloc:

## Benchmarks single item in parse return:
time:   [4.0442 ms 4.0483 ms 4.0527 ms]
time:   [4.0619 ms 4.0658 ms 4.0702 ms]
time:   [4.0768 ms 4.0808 ms 4.0858 ms]

## Benchmarks multiple items in parse return:
time:   [6.7369 ms 6.7390 ms 6.7412 ms]
time:   [6.7874 ms 6.7941 ms 6.8051 ms]
time:   [6.6665 ms 6.6697 ms 6.6739 ms]

Windows x86:

# Mimalloc

## App with DLT:
File Read Took: 18878
File Read Took: 17322
File Read Took: 18876

## Benchmarks single item in parse return:
time:   [6.3784 ms 6.3877 ms 6.3972 ms]
time:   [6.3674 ms 6.3756 ms 6.3839 ms]
time:   [6.3769 ms 6.3865 ms 6.3961 ms]

## Benchmarks multiple items in parse return:
time:   [7.7845 ms 7.8004 ms 7.8174 ms]
time:   [7.7006 ms 7.7140 ms 7.7282 ms]
time:   [7.6867 ms 7.6994 ms 7.7125 ms]

--------------------------------------------------

# System Allocator:

## App with DLT:
File Read Took: 24033
File Read Took: 22460
File Read Took: 22047

## Benchmarks single item in parse return:
time:   [13.752 ms 13.773 ms 13.793 ms]
time:   [13.741 ms 13.767 ms 13.795 ms]
time:   [13.763 ms 13.787 ms 13.813 ms]

## Benchmarks multiple items in parse return:
time:   [17.503 ms 17.528 ms 17.554 ms]
time:   [17.558 ms 17.589 ms 17.620 ms]
time:   [17.484 ms 17.511 ms 17.540 ms]

---------------------------------------------------

# Jemalloc: *NOT SUPPORTED ON WINDOWS!!!!*

@AmmarAbouZor
Copy link
Member Author

Here are notes and the commands to run the benchmarks:

Commands to run the benchmarks:

Please make sure you have the latest version ob the build CLI tool installed

# Mimalloc Benchmarks
cargo chipmunk bench core mocks_once_producer -r 5
cargo chipmunk bench core mocks_multi_producer -r 5

# Jemalloc Benchmarks
cargo chipmunk bench core mocks_once_producer_jemalloc -r 5
cargo chipmunk bench core mocks_multi_producer_jemalloc -r 5

# Standard Allocator Benchmarks
cargo chipmunk bench core mocks_once_producer_sysalloc -r 5
cargo chipmunk bench core mocks_multi_producer_sysalloc -r 5

Running in Chipmunk:

I've already added code to print the time it'll take to parse a DLT file. I used a file with the size 500 MB

  1. At first you can run current chipmunk from this PR in production mode and open the dlt file couple of times. The time will be printed on stdout.
  2. After that you can commit out the allocator lines in the file {repo}/application/apps/rustcore/rs-bindings/lib.rs to use the default allocators and then build and run chipmunk in production mode again and repeat the measurement

@AmmarAbouZor
Copy link
Member Author

Update: Jemalloc allocator isn't fully supported on Windows yet

@AmmarAbouZor AmmarAbouZor force-pushed the mimalloc_allocator branch 2 times, most recently from 011441e to 82dbcc2 Compare October 24, 2024 07:38
@AmmarAbouZor
Copy link
Member Author

Update:
jemalloc allocator support on Windows is still unstable, which makes it unrelevant for our use-case.

I've removed it as dev dependency and deleted its benchmarks

* `mimalloc` provided better performance on both benchmarks and in while
  the app is running (Performance gain is about 9%)
* Allocator can be set in one place only for the app, therefore we set
  it rs-bindings for chipmunk and set it for each benchmark separately
* Creates a macro to run the mock once benchmarks with the given allocator
* jemalloc allocator dependency added.
* Add extra benchmarks for jemalloc and standard allocators along side
  with the currently used mimalloc allocator
* Adjustment for bench file in Build CLI Tool
* Documentation
* Creates a macro to run the mock multi benchmarks with the given allocator
* Add extra benchmarks for jemalloc and standard allocators along side
  with the currently used mimalloc allocator
* Adjustment for bench file in Build CLI Tool
* Documentation
* Add basic benchmarks to run while Chipmunk App is parsing a DLT file
  to measure the changes on different platform
* This commit must be dropped before merging the PR
`jemalloc` support on windows is not stable yet.
@AmmarAbouZor AmmarAbouZor marked this pull request as ready for review November 4, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rust: Experiment with different allocators
1 participant