Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable support for Linux THP on architectures other than amd64 #8702

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions erts/configure
Original file line number Diff line number Diff line change
Expand Up @@ -25692,8 +25692,8 @@ printf "%s\n" "no" >&6; }
;;
esac

case $OPSYS in #(
linux*) :
case $ARCH-$OPSYS in #(
amd64-linux*) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an option allowing people to opt-in. OTP-27.0.1 works fine out of the box on an M1 mini running Fedora 40, i.e. arm64-linux. The generated config.h says #define HAVE_LINUX_THP 1.

Is there a way to determine if the THP optimization actually kicks in or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an option allowing people to opt-in. OTP-27.0.1 works fine out of the box on an M1 mini running Fedora 40, i.e. arm64-linux. The generated config.h says #define HAVE_LINUX_THP 1.

Thank you, that is good to know.

The linker flags that I am using align stuff to a 2MiB boundary makes sense for 64-bit x86 which uses a 2MiB page for THP. However, the size of a transparent huge page is not guaranteed to be same on all architectures so that alignment is not guaranteed to be correct. I believe it is reasonable for a 32-bit ARM, at least on the version of Debian for armhf that I installed on a QEMU system last night, but I am not sure if it is reasonable for different variants of 64-bit ARM.

Since I don't have real hardware to test on at the moment, I am hesitant to enable other architectures. Leaving it in could be a no-op, or worse. I thought there was an argument to ./configure to opt-out of checking for, and enabling, THP but it seems to not show up when I run ./configure --help. (My mistake.) Giving users control of this does seem like a reasonable thing to do.

Can you please tell me what the value of /sys/kernel/mm/transparent_hugepage/hpage_pmd_size is on your M1? Apple uses a 16KiB page by default, so this might have a unique value on that system.

Is there a way to determine if the THP optimization actually kicks in or not?

To tell if the .text segment is being mapped with THP look for the .text segment mapping in /proc/$pid/smaps for an Erlang node's process. If the entry for the .text segment has a non-zero value for FilePmdMapped I believe everything should be working. Here is an example of what that looks like on my system

00600000-00c00000 r-xp 00200000 00:1b 21865                              /path/to/beam.frmptr.smp
Size:               6144 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                6144 kB
Pss:                6144 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      6144 kB
Private_Dirty:         0 kB
Referenced:         6144 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:      6144 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    1
ProtectionKey:         0
VmFlags: rd ex mr mw me hg 

Note a few additional things

  1. The mapping starts at 0x00600000, a multiple of 2MiB
  2. The value of Size is 6144 kB, also a multiple of 2MiB
  3. The value of THPeligible is 1
  4. The value of VmFlags includes hg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please tell me what the value of /sys/kernel/mm/transparent_hugepage/hpage_pmd_size is on your M1?

mini_7_ls /sys/kernel/mm/transparent_hugepage/
/sys/kernel/mm/transparent_hugepage:
defrag             hpage_pmd_size     hugepages-128kB/   hugepages-2048kB/  hugepages-32768kB/ hugepages-512kB/   hugepages-8192kB/  shmem_enabled 
enabled            hugepages-1024kB/  hugepages-16384kB/ hugepages-256kB/   hugepages-4096kB/  hugepages-64kB/    khugepaged/        use_zero_page 
mini_8_cat /sys/kernel/mm/transparent_hugepage/enabled 
always [madvise] never
mini_9_cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size 
33554432

To tell if the .text segment is being mapped with THP look for the .text segment mapping in /proc/$pid/smaps for an Erlang node's process. If the entry for the .text segment has a non-zero value for FilePmdMapped I believe everything should be working. Here is an example of what that looks like on my system

00600000-00c00000 r-xp 00200000 00:1b 21865                              /path/to/beam.frmptr.smp
Size:               6144 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                6144 kB
Pss:                6144 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      6144 kB
Private_Dirty:         0 kB
Referenced:         6144 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:      6144 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    1
ProtectionKey:         0
VmFlags: rd ex mr mw me hg 

Note a few additional things

1. The mapping starts at 0x00600000, a multiple of 2MiB

2. The value of `Size` is 6144 kB, also a multiple of 2MiB

3. The value of `THPeligible` is `1`

4. The value of `VmFlags` includes `hg`

For the one executable mapping it shows:

00600000-0096c000 r-xp 00200000 00:22 309231                             /path/to/lib/erlang/erts-15.0.1/bin/beam.smp
Size:               3504 kB
KernelPageSize:       16 kB
MMUPageSize:          16 kB
Rss:                3184 kB
Pss:                3184 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      3184 kB
Private_Dirty:         0 kB
Referenced:         3184 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me

which I take it means the optimization didn't kick in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini_9_cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
33554432

Thank you.

AFAIK, Aarch64 has three "translation granules". The 4KiB granule is like x86 and offers 4KiB, 2MiB, and 1GiB pages. Your machine seems to use a 16KiB granule that offers 16KiB and 32MiB pages.

The settings I chose had the 4KiB, 2MiB, and 1GiB page sizes in mind. Other granules are likely to benefit less from this optimization since pages sizes are larger giving the TLB more coverage. That should mean fewer iTLB misses without the need to mess around with Linux "hugepages", a good thing.

That said, for things like the heap, a 32MiB page should be beneficial. That is a separate optimization I added and it is controlled by the +MMlp on|off flag. A 32MiB page should be beneficial for the JIT cache but my patches to enable that were never accepted by asmjit.

If you are seeing a lot of iTLB misses from the .text segment, measurable with perf(1), there is a feature in newer kernels called multi-size THP which can simulate a large page size using multiple PTEs. That might still be a win for smaller regions of memory like the .text segment of Erlang. See

https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html

Alas, I am not sure multi-size THP is relevant for the mapping of the .text segment using the current strategy. Some additional research is needed. However, even if it isn't, there is a trick that can be done at startup where

  1. the text segment is saved away
  2. the address space of the .text segment is unmapped and remapped using whatever options you want
  3. the text segment is copied back into place

Not so pretty but it can be worth a lot of performance and open-source code is already available to do this.

which I take it means the optimization didn't kick in.

Doesn't look like it. Try using large pages with the heap?


{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the Transparent Huge Pages interface is available" >&5
printf %s "checking whether the Transparent Huge Pages interface is available... " >&6; }
Expand Down
4 changes: 2 additions & 2 deletions erts/configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -3036,9 +3036,9 @@ case $host_os in
esac

dnl Checks for the Transparent Huge pages (THP) availability on Linux
AS_CASE([$OPSYS],
[linux*],
[
AS_CASE([$ARCH-$OPSYS],
[amd64-linux*],
AC_CACHE_CHECK(
[whether the Transparent Huge Pages interface is available],
erts_cv_linux_thp,
Expand Down
Loading