-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518
Comments
#6285 by @jimblandy is the PR in question. I remember you talking about this technique and I'm very surprised to see it have severe performance implications. Of note, the perf regression (and infinite loop fix that caused it) are both on Mac as far as I understand. Reproduction on other platforms would likely be useful. |
First of all, thank you very much @rudderbucky for tracking this down! Isolating it to a particular commit is very helpful. That commit shouldn't have any effect on any platform other than Metal. The only change is to the Metal Shading Language that Naga generates for loops. The We want to prefer safe defaults until we know what their runtime cost is. @rudderbucky, how much of a slowdown are you seeing? You say it's "drastic", and I believe you, but let's get some numbers into the issue. |
@jimblandy FWIW gating the feature to browsers would work perfectly for me because I am running native. On my MBP I am seeing in the example a drop from ~200FPS to 75FPS. Here's another set of numbers from a different tester, who seemed to have more problems on HiDPI display settings:
|
That impact is serious enough to merit a workaround. To confirm that the diff --git a/naga/src/back/msl/writer.rs b/naga/src/back/msl/writer.rs
index 83d937d48..d776e7c37 100644
--- a/naga/src/back/msl/writer.rs
+++ b/naga/src/back/msl/writer.rs
@@ -803,14 +803,7 @@ impl<W: Write> Writer<W> {
}
self.loop_reachable_macro_name = self.namer.call("LOOP_IS_REACHABLE");
- let loop_reachable_volatile_name = self.namer.call("unpredictable_jump_over_loop");
- writeln!(
- self.out,
- "#define {} if (volatile bool {} = true; {})",
- self.loop_reachable_macro_name,
- loop_reachable_volatile_name,
- loop_reachable_volatile_name,
- )?;
+ writeln!(self.out, "#define {}", self.loop_reachable_macro_name,)?;
Ok(())
} If that brings performance back up, then we can be confident that it's worth making this conditional. |
@jimblandy yep just confirmed that doing that fixes performance on my end back to ~200FPS. |
Okay, then I think we've caught the culprit. I think the fix is to have Naga give that macro an empty definition unless some sort of bounds checks are enabled. Then native apps should use |
@jimblandy err just read this... I took the naive route and just gated the code you mentioned to wasm32 in #6520. Let me know if you'd like it done the way you mentioned. |
Description
PR #6285 introduces performance issues for wgsl loops.
Repro steps
Compare the two bevy examples here: https://github.com/rudderbucky/bevy/tree/many_0.14 and https://github.com/rudderbucky/bevy/tree/many_0.15. In particular run
cargo run --features="bevy_dev_tools" --example fps_overlay_stress
in both, in fullscreen via the top left green button on macOS, and compare the fpsThe former uses bevy 0.14 which is on v20 and the latter is on v23. Unfortunately to prove this PR is the specific one that causes performance issues is a bit more involved - you will need to clone the wgpu repo to checkout the PR commit, clone the naga_oil and clone the bevy repo to point all wgpu/naga/naga-oil dependencies to their local versions (in both bevy and naga_oil). Sadly a breaking change in wgpu causes bevy to stop compiling, though the fix to test the above examples is as simple as commenting out the
set_bind_group
lines that are causing issues.Expected vs observed behavior
Performance decreases drastically between wgpu commits 3fda684 (PR 6285) and the one before.
Platform
Reproduced on M1 Pro 16 inch MBP and M1 Mac mini
The text was updated successfully, but these errors were encountered: