Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LTO in supported compilers #5874

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Enable LTO in supported compilers #5874

wants to merge 2 commits into from

Conversation

In-line
Copy link

@In-line In-line commented May 4, 2024

Describe your PR, what does it fix/add?

Enabled LTO, because why not?

Is there anything you want to mention? (unchecked code, possible bugs, found problems, breaking compatibility, etc.)

No

Is it ready for merging, or does it need work?

It's ready

@Agent00Ming
Copy link
Contributor

Benefits of LTO

LTO can give double digit performance boosts for many programs.
Can lower RAM usage per program making it very useful for limited memory systems.

Downsides of LTO

Can increase compile time by 2 to 3 times.
Uses more RAM during compiling.
Not all programs become faster or smaller.
There is an increased chance of finding build-time or runtime bugs while using it.
Always be prepared to try without it if something is acting odd.

gentoo wiki

@In-line
Copy link
Author

In-line commented May 4, 2024

Some stats on my machine (test before the patch)
cmake -G Ninja -B build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON/OFF

GCC (-flto=auto):
LTO ON: cmake --build build/ --clean-first 298.82s user 32.47s system 1880% cpu 17.613 total
LTO OFF: cmake --build build --clean-first 507.19s user 31.09s system 2270% cpu 23.704 total

Clang(-flto=thin)
LTO ON: cmake --build build/ --clean-first 276.76s user 10.66s system 1997% cpu 14.391 total
LTO OFF: cmake --build build --clean-first 308.75s user 10.49s system 2278% cpu 14.012 total

❯ clang --version
clang version 17.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
❯ gcc --version                     
gcc (GCC) 13.2.1 20240417
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
❯ neofetch                                
                   -`                    codemonkey@workstation-01 
                  .o+`                   ------------------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Kernel: 6.8.9-1-cachyos-bore 
               `+oooooo:                 Uptime: 4 hours, 50 mins 
               -+oooooo+:                Packages: 2471 (pacman), 6 (flatpak) 
             `/:-:++oooo+:               Shell: zsh 5.9 
            `/++++/+++++++:              Resolution: 2560x1440 
           `/++++++++++++++:             DE: Hyprland 
          `/+++ooooooooooooo/`           WM: sway 
         ./ooosssso++osssssso+`          Theme: Adwaita [GTK2], Adwaita-dark [GTK3] 
        .oossssso-````/ossssss+`         Icons: Adwaita [GTK2/3] 
       -osssssso.      :ssssssso.        Terminal: vscode 
      :osssssss/        osssso+++.       CPU: AMD Ryzen 9 7950X (32) @ 5.881GHz 
     /ossssssss/        +ssssooo/-       GPU: AMD ATI Radeon RX 7900 XT/7900 XTX/7900M 
   `/ossssso+/:-        -:/+osssso+-     Memory: 19065MiB / 63999MiB 
  `+sso+:-`                 `.-/+oso:
 `++:.                           `-/+/                           
 .`                                 `/                           

@JohnRTitor
Copy link
Contributor

Obviously this support is still experimental, but nice addition to have. Thoughts? @vaxerski

@vaxerski
Copy link
Member

vaxerski commented May 5, 2024

looking at the drawbacks, I'm not convinced this is a good idea.

@JohnRTitor
Copy link
Contributor

I do agree that this should not be enabled by default :)
But if the user is adventurous enough to try :)

@In-line
Copy link
Author

In-line commented May 5, 2024

Well I'm not sure, where it's mentioned that LTO is experimental. Both GCC and Clang claim it's mature. Maybe it was experimental a few years ago, but it's not currently. Chromium, Firefox and many more much complex and bigger projects use it.

@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?

@vaxerski
Copy link
Member

vaxerski commented May 5, 2024

no clue, I've never used lto

@JohnRTitor
Copy link
Contributor

@In-line can you provide a "patch" for meson based building too? I'll try to build and test it on Nix.

@fufexan
Copy link
Member

fufexan commented May 5, 2024

@JohnRTitor you can rebase this PR on top of #5667 to test. I'm going to merge that soon.

@JohnRTitor
Copy link
Contributor

JohnRTitor commented May 5, 2024

I am not the PR author this time :)
@In-line well, you heard fufexan :)

@Agent00Ming
Copy link
Contributor

Agent00Ming commented May 5, 2024

I still think this should be left as an "option", the compile times will vary due to hardware and feature sets.

Compilation time table for me:
LTO OFF ON
real 0m55.257s 0m38.746s -30%
user 12m28.347s 7m28.746s -40%
sys 0m20.171s 0m24.944s +25%

@fufexan
Copy link
Member

fufexan commented May 5, 2024

I am not the PR author this time :) @In-line well, you heard fufexan :)

I meant more as: clone repo, gh pr checkout 5874, checkout cmake, git rebase In-line:lto.

But the CMake PR is now merged, so a simple rebase should get you up and running.

@fufexan
Copy link
Member

fufexan commented May 5, 2024

What starship reports in my case:
LTO on: 1m57s
LTO off: 2m44s

@JohnRTitor
Copy link
Contributor

GCC lto itself does not do much. Clang LTO, especially thin LTO is much better.

@JohnRTitor
Copy link
Contributor

@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?

Maybe these are not what you are looking for, but can be helpful:

https://www.phoronix.com/review/clang-lto-kernel
https://www.phoronix.com/review/clang-12-opt
https://www.phoronix.com/review/gcc11-rocket-opts
They are pretty outdated though.

@JohnRTitor
Copy link
Contributor

Clang LTO: Finished at 20:34:56 after 1m3s
GCC LTO: Finished at 20:28:26 after 1m16s

@In-line
Copy link
Author

In-line commented May 5, 2024

Hyprland isn't that big to be bottlenecked by CPU compilation time on modern systems. I don't think compilation time is the metric that has noticeable regression for us.

I meant CPU bottleneck benchmarks for Hyprland to see how much difference it brings in weak systems with iGPUs, where bottleneck might be on CPU side. As LTO is performance optimization, it should decrease Hyprland executable size and increase it's execution speed.

I was asking for any benchmarks I can run on slow GPU to test improvements that come with LTO.

@In-line
Copy link
Author

In-line commented May 5, 2024

@JohnRTitor Patches for Meson are ready

@nonetrix
Copy link

nonetrix commented May 14, 2024

I don't know if this is a good idea either, even more so if we don't benchmark it at least and see if there is meaningful improvement. Has anyone tried something to get Hyprland to lag and compare with and without? Maybe a stress test would be a neat idea if someone would like to work on that if it doesn't already exist, also could prove to be useful in improving performance in general without compiler flags if we can profile it. I have compiled my whole system with Gentoo in the past with LTO and NodeJS was the only thing that caused issues so it's somewhat stable I guess but likely still not good idea. But I imagine you might get bigger gains doing -O3 or -march=native latter wouldn't be practical of course always. Maybe this could be added as like a build option for those who want it to be faster and don't mind possible bugs? But would have to check if it actually is or not, sometimes can make things slower

@In-line
Copy link
Author

In-line commented May 24, 2024

I think all this conversations about some abstract rick in enabling LTO are pointless. As Hyprland is included in ALHP project already https://status.alhp.dev/?pkgbase=hyprland

I don't understand what all the "risk" fuss is about to be fair.

@gnusenpai
Copy link
Contributor

gnusenpai commented May 25, 2024

So are there any actual requirements for being included in ALHP, other than: "it builds, ship it"? I imagine getting this endorsed here officially will take a bit more than that.

@vaxerski vaxerski force-pushed the main branch 2 times, most recently from 358e59e to 3fd6c1b Compare June 3, 2024 16:46
Copy link
Contributor

@JohnRTitor JohnRTitor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the merge conflict here, I have been using LTO compiled Hyprland with Clang for a week now, and it feels like a breeze.

Memory consumption has been reduced by 5%. And I could not spot any bugs related to this.

Perhaps @fufexan could test this on their system as well.

@CNR0706
Copy link

CNR0706 commented Jun 13, 2024

I don't think anything speaks against using LTO on modern Linux.

openSuSE for example has been using Link Time Optimization for their entire repos since 2019 and there are no issues what so ever. Some other distros like Arch Linux and CachyOS also enable LTO for all packages.

Personally I've been running LTOd Hyprland across openSuSE TW and Gentoo for about 4 months at this point and I've never encountered anything strange.

To me personally enabling LTO is a no-brainer as long as it doesn't cause build issues (which it doesn't for Hyprland).

@fufexan
Copy link
Member

fufexan commented Jun 13, 2024

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

@JohnRTitor
Copy link
Contributor

Could you try rebooting and check it when the session is idle?

I am not sure how nixos-rebuild switch performs switch of display managers/DEs without restarting them.

@fufexan
Copy link
Member

fufexan commented Jun 13, 2024

Could you try rebooting and check it when the session is idle?

Will do later, as I'm working on something right now.

I am not sure how nixos-rebuild switch performs switch of display managers/DEs without restarting them.

It doesn't, as the service is already running. I've simply rebased this PR onto master and built it in the Hyprland repo, then launched the binary from tty.

@JohnRTitor
Copy link
Contributor

image

This is with Clang LTO on.

@ErrorNoInternet
Copy link
Contributor

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

Are you sure that's not #6459?

@fufexan
Copy link
Member

fufexan commented Jun 13, 2024

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

Are you sure that's not #6459?

No, it could be that.

@gnusenpai
Copy link
Contributor

I decided to run some basic tests on this. The following numbers were taken after a clean reboot, spawning some windows, and resizing them a bit to simulate usage.

Test system is:

Kernel: 6.9.4
Distro: Gentoo
Hyprland: 0.41.1
GCC: 13.2.1
Clang: 17.0.6
libc: glibc 2.38
gcc:
    size: 7971152B
    mem:
        153.824MB
        154.352MB
        150.5MB
        147.105MB
        146.73MB
        ---------
        150.502MB

gcc+lto:
    size: 6504656B (-18.4%)
    mem:
        155.539MB
        147.594MB
        145.941MB
        161.391MB
        145.891MB
        ---------
        151.271MB (+0.51%)

clang:
    size: 7489688B
    mem:
        162.188MB
        154.961MB
        154.352MB
        147.148MB
        150.805MB
        ---------
        153.891MB

clang+lto:
    size: 6711216B (-10.4%)
    mem:
        156.711MB
        149.492MB
        154.074MB
        153.367MB
        156.953MB
        ---------
        154.119MB (+0.15%)

So, based on this, the binaries are a decent chunk smaller and there isn't any obvious memory usage regression. But these are just preliminary tests. At the time of measurement, the system was up for only ~1min. I would be interested in seeing what effect it has on CPU usage (if any), but I think that's a bit trickier to measure.

Copy link

@vignesh1507 vignesh1507 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works perfectly in my local env.

@JohnRTitor
Copy link
Contributor

With my setup, enabling LTO with GCC, Hyprland crashes on startup, but if I compile with Clang+LTO, it's fine.

@nonetrix
Copy link

Hm, might be GCC bug or Hyprland bug. Probably could be fixed, any logs etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants