-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable support for Linux THP on architectures other than amd64 #8702
Merged
garazdawi
merged 1 commit into
erlang:maint
from
lexprfuncall:try-linux-thp-only-on-linux-amd64
Sep 5, 2024
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an option allowing people to opt-in. OTP-27.0.1 works fine out of the box on an M1 mini running Fedora 40, i.e.
arm64-linux
. The generatedconfig.h
says#define HAVE_LINUX_THP 1
.Is there a way to determine if the THP optimization actually kicks in or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that is good to know.
The linker flags that I am using align stuff to a 2MiB boundary makes sense for 64-bit x86 which uses a 2MiB page for THP. However, the size of a transparent huge page is not guaranteed to be same on all architectures so that alignment is not guaranteed to be correct. I believe it is reasonable for a 32-bit ARM, at least on the version of Debian for armhf that I installed on a QEMU system last night, but I am not sure if it is reasonable for different variants of 64-bit ARM.
Since I don't have real hardware to test on at the moment, I am hesitant to enable other architectures. Leaving it in could be a no-op, or worse. I thought there was an argument to
./configure
to opt-out of checking for, and enabling, THP but it seems to not show up when I run./configure --help
. (My mistake.) Giving users control of this does seem like a reasonable thing to do.Can you please tell me what the value of
/sys/kernel/mm/transparent_hugepage/hpage_pmd_size
is on your M1? Apple uses a 16KiB page by default, so this might have a unique value on that system.To tell if the
.text
segment is being mapped with THP look for the.text
segment mapping in/proc/$pid/smaps
for an Erlang node's process. If the entry for the.text
segment has a non-zero value forFilePmdMapped
I believe everything should be working. Here is an example of what that looks like on my systemNote a few additional things
Size
is 6144 kB, also a multiple of 2MiBTHPeligible
is1
VmFlags
includeshg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the one executable mapping it shows:
which I take it means the optimization didn't kick in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
AFAIK, Aarch64 has three "translation granules". The 4KiB granule is like x86 and offers 4KiB, 2MiB, and 1GiB pages. Your machine seems to use a 16KiB granule that offers 16KiB and 32MiB pages.
The settings I chose had the 4KiB, 2MiB, and 1GiB page sizes in mind. Other granules are likely to benefit less from this optimization since pages sizes are larger giving the TLB more coverage. That should mean fewer iTLB misses without the need to mess around with Linux "hugepages", a good thing.
That said, for things like the heap, a 32MiB page should be beneficial. That is a separate optimization I added and it is controlled by the
+MMlp on|off
flag. A 32MiB page should be beneficial for the JIT cache but my patches to enable that were never accepted by asmjit.If you are seeing a lot of iTLB misses from the
.text
segment, measurable withperf(1)
, there is a feature in newer kernels called multi-size THP which can simulate a large page size using multiple PTEs. That might still be a win for smaller regions of memory like the.text
segment of Erlang. Seehttps://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html
Alas, I am not sure multi-size THP is relevant for the mapping of the
.text
segment using the current strategy. Some additional research is needed. However, even if it isn't, there is a trick that can be done at startup whereNot so pretty but it can be worth a lot of performance and open-source code is already available to do this.
Doesn't look like it. Try using large pages with the heap?