Profile-Guided Optimization (PGO) benchmark result #545

zamazan4ik · 2024-06-21T07:21:22Z

Hi!

I was interested in optimizing the library's performance even further (the library already is pretty fast). I evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.

Test environment

Fedora 40
Linux kernel 6.9.4
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.79
html5ever version: the latest for now from the main branch on commit e69b05c849031d6a2837d7d86372ff14b3f4080e
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with taskset -c 0 cargo bench --workspace --all-features command. The PGO training phase is done with task set -c 0 cargo pgo bench -- --workspace --all-features, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench -- --workspace --all-features. taskset -c 0 is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/ae7f47e783e87f688eb7f29c78468237
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/c5a8657218c22eeb4ac5edce0a6b3cf0
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/2c1caae4675577766b745c7436c62697

According to the results, PGO measurably improves the library's performance in many cases.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other datasets (if you are interested enough in it). If it shows improvements - add a note to the documentation (the README file, I guess) about possible improvements in the library's performance with PGO.
Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference.

I would be happy to answer your questions about PGO.

P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark result #545

Profile-Guided Optimization (PGO) benchmark result #545

zamazan4ik commented Jun 21, 2024

Profile-Guided Optimization (PGO) benchmark result #545

Profile-Guided Optimization (PGO) benchmark result #545

Comments

zamazan4ik commented Jun 21, 2024

Test environment

Benchmark

Results

Further steps