Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perf degradation on web builds #11227

Merged
merged 4 commits into from
Jan 6, 2024

Conversation

nicopap
Copy link
Contributor

@nicopap nicopap commented Jan 5, 2024

Objective

Solution

  • Move the app update code into the Event::WindowEvent { event: WindowEvent::RedrawRequested } branch of the event loop.
  • Run window.request_redraw() When runner_state.redraw_requested
    • Instead of swapping ControlFlow between Poll and Wait, we always keep it at Wait, and use window.request_redraw() to schedule an immediate call to the event loop.
    • runner_state.redraw_requested is set to true when UpdateMode::Continuous and when a RequestRedraw event is received.
  • Extract the redraw code into a separate function, because otherwise I'd go crazy with the indentation level.
  • Fix Possible performance regression in WASM builds #11122.

Testing

I tested the WASM builds as follow:

cargo run -p build-wasm-example -- --api webgl2 bevymark
python -m http.server --directory examples/wasm/ 8080
# Open browser at http://localhost:8080

On main, even spawning a couple sprites is super choppy. Even if it says "300 FPS". While on this branch, it is smooth as butter.

I also found that it fixes all choppiness on window resize (tested on Linux/X11). This was another issue from #10702 IIRC.

So here is what I tested:

  • On wasm: many_foxes and bevymark, with argh::from_env() commented out, otherwise we get a cryptic error.
  • Both with PresentMode::AutoVsync and PresentMode::AutoNoVsync
    • On main, it is consistently choppy.
    • With this PR, the visible frame rate is consistent with the diagnostic numbers
  • On native (linux/x11) I ran similar tests, making sure that AutoVsync limits to monitor framerate, and AutoNoVsync doesn't.

Future work

Code could be improved, I wanted a quick solution easy to review, but we really need to make the code more accessible.

Review guide

Consider enable the non-whitespace diff to see the real change set.

@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Windowing Platform-agnostic interface layer to run your app in C-Performance A change motivated by improving speed, memory usage or compile times P-High This is particularly urgent, and deserves immediate attention labels Jan 5, 2024
@alice-i-cecile alice-i-cecile added this to the 0.13 milestone Jan 5, 2024
@alice-i-cecile
Copy link
Member

@daxpedda: does this look like the right fix to you?

Copy link
Contributor

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the ping @alice-i-cecile!
LGTM!

I also noticed that Bevy doesn't take WindowEvent::Occluded into account when drawing. I will open a separate issue for that.

EDIT: #11229.

create_window_system_state.apply(&mut app.world);
let (_, winit_windows, _, _) = event_writer_system_state.get_mut(&mut app.world);
for window in winit_windows.windows.values() {
window.request_redraw();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably also call Window::request_redraw() in WindowEvent::RedrawRequested.

I would generally recommend not to use Event::AboutToWait at all unless you have a very specific use-case in mind.

@daxpedda
Copy link
Contributor

daxpedda commented Jan 5, 2024

FYI, this has probably affected all platforms, it's just especially bad on Web because Event::AboutToWait is triggered much more often as a result of it's callback-based nature.

@Vrixyz Vrixyz self-requested a review January 5, 2024 20:48
@Vrixyz Vrixyz mentioned this pull request Jan 5, 2024
23 tasks
@Vrixyz
Copy link
Member

Vrixyz commented Jan 5, 2024

Tried the example "sprite" on firefox osx ;

On main: Firefox_mac_profile_main_425570aa752b32ef.json.gz
This pr: Firefox_mac_profile_pr_a1430d76c683562f7.json.gz

I'm not too sure how to read those traces so I'll refrain from drawing conclusions, I'll test on native too.

@daxpedda
Copy link
Contributor

daxpedda commented Jan 5, 2024

So I did profile this locally as well now and the performance is pretty bad compared to Winit v0.28 because it's still using ControlFlow::Poll.

Unfortunately I'm pretty unfamiliar with how Bevy uses Winit exactly, but ControlFlow::Poll can't be reasonably used on Web with Winit v0.29. The reason why it worked in v0.28 is because it was implemented incorrectly, basically ControlFlow::Poll was as fast as your frame rate, which differs to native, where ControlFlow::Poll means as fast as possible. This was fixed in v0.29 and now ControlFlow::Poll actually runs as fast as possible on Web.
Unfortunately running pretty heavy Wasm as fast as possible in a browser produces horrible lag, which is what we are seeing right now.

There are two ways to solve this:

  • Use ControlFlow::Wait, I've tried the following which seems to fix the issue:
            .insert_resource(WinitSettings {
                focused_mode: bevy::winit::UpdateMode::Reactive {
                    wait: Duration::from_millis(1),
                },
                unfocused_mode: bevy::winit::UpdateMode::ReactiveLowPower {
                      wait: Duration::from_millis(10),
                },
            })
  • Use PollStratey::IdleCallback, which is much better, but can still cause serious lagging when the browsers algorithm to determine idle time is not doing so well.

@nicopap
Copy link
Contributor Author

nicopap commented Jan 6, 2024

I'm doing an audit of how bevy handles the winit loop. I've noticed there is a few different knobs, so I want to have an overall view of it before I feel confident about this PR. It's also an opportunity to document the full behavior and fix any incidental weirdness.

@nicopap
Copy link
Contributor Author

nicopap commented Jan 6, 2024

Looking at the UpdateMode documentation:

/// **Note:** This setting is independent of VSync. VSync is controlled by a window's
/// [`PresentMode`](bevy_window::PresentMode) setting. If an app can update faster than the refresh
/// rate, but VSync is enabled, the update rate will be indirectly limited by the renderer.

Looks like bevy relies implicitly on VSync, with PresentMode::Fifo. So even if you set ControlFlow to Poll, bevy would block until vsync, effectively regulating the frame rate.

With a bit more exploration (thanks to the discussion around #11233 (discord link (not that relevant, but included for completeness sake))) I understand bevy uses the wgpu::SurfaceTexture::present() API as a way to block — and therefore stall updates — on Vsync framerate. app.update() will therefore block until VSync.

However, bevy uses pipelined rendering, so it's not as trivial as this. What really happen is a complex process:

  1. Bevy splits at startup the "render" and "main" Apps in two separate threads.
  2. In app.update(), which is called every WindowEvent::RedrawRequested in this PR, bevy does:
    a. Run the "main" App schedule
    b. Waits until the "render" App is ready to receive extraction
    c. Run the Extract stage, which are systems that move data from the main App to the render App
  3. Then, after the extraction is done, the following two things occur at the same time:
    • bevy returns from the app.update() freeing the main loop for winit to manage.
    • bevy runs the render App, which does all the rendering things and call surface_texture.present(). This will busy the render App, so that step 2.b waits until it ends.

Now, on the web, this changes, as bevy doesn't do any threading on WASM. I'm not exactly sure how it works though, but looking at the source for bevy_framepace, it assumes that when pipelined rendering is not enabled (ie: when there are no extract stages) the winit loop starts after the end of rendering.

The wgpu docs on PresentMode::Fifo doesn't mention a special behavior on WASM. But if I understand @daxpedda correctly, ControlFlow:Poll will trigger some update at the browser level independently of the bevy game loop. Which may cause massive lag.

@nicopap
Copy link
Contributor Author

nicopap commented Jan 6, 2024

The conclusion is that IdleCallback is probably what we want. Though from daxpedda's description, this might interfere with frame pacing plugins on WASM. I'll update this PR to include it.

@daxpedda
Copy link
Contributor

daxpedda commented Jan 6, 2024

With a bit more exploration (thanks to the discussion around #11233 (discord link (not that relevant, but included for completeness sake))) I understand bevy uses the wgpu::SurfaceTexture::present() API as a way to block — and therefore stall updates — on Vsync framerate. app.update() will therefore block until VSync.

This is not the case on Wasm unfortunately. Using PresentMode::Fifo will not wait until VSync on Wasm, it will just progress immediately.
So using ControlFlow::Poll on Wasm will render much faster then VSync (not really, because the browser will limit actual rendering, but the rendering work will still be done).

The conclusion is that IdleCallback is probably what we want.

If I understand correctly Bevy isn't calling app.update() outside of RedrawRequested, which is VSync. So I'm unsure why Bevy would want to use ControlFlow::Poll at all. What is the use-case for this?

@nicopap
Copy link
Contributor Author

nicopap commented Jan 6, 2024

Oops, sorry for saying erroneous things in #11227 (comment).

It looks like the blocking call is surface.get_current_texture() Surface::get_current_texture1 in crates/bevy_render/src/view/window/mod.rs. SurfaceTexture::present is actually non-blocking on most plateform. Regardless, get_current_texture runs in the render app, so it should not change the conclusion from that linked comment.

The winit docs mention using Poll and using the graphic API Vsync to control framerate rather than using WaitUntil. It might be out of date.

Hunting your comments on this repository. I understand you are suggesting using a combination of two things:

  • ControlFlow::WaitUntil (or just Wait)
  • Calling Window::request_redraw at the end each update runs.

Do I understand correctly that request_redraw immediately queues a next WindowEvent::RedrawRequested? If so, we could completely substitute Poll with just calling request_redraw. From my testing, this fixes a lot of issues related to resizing windows on native builds too, and keeps the same behavior, even with vsync disabled.

Though it probably mixes poorly with frame pacing plugins.

Footnotes

  1. Source: https://github.com/gfx-rs/wgpu/issues/3283

@nicopap nicopap marked this pull request as ready for review January 6, 2024 14:55
@nicopap nicopap marked this pull request as draft January 6, 2024 14:56
@daxpedda
Copy link
Contributor

daxpedda commented Jan 6, 2024

The winit docs mention using Poll and using the graphic API Vsync to control framerate rather than using WaitUntil. It might be out of date.

This is a mistake imo, unfortunately the Winit docs are in need of some very serious improvements.

Do I understand correctly that request_redraw immediately queues a next WindowEvent::RedrawRequested? If so, we could completely substitute Poll with just calling request_redraw.

Window::request_redraw() queues an event, but it won't fire until it's underlying API, which is supposed to be VSync, tells it to. If your intention is to draw in VSync, using RedrawRequested is all you need. If you intend to draw all the time, you have to not use RedrawRequested.

In either case you probably don't want to use Poll on Web, but throttle it to something reasonable, e.g. drawing every millisecond.

Though it probably mixes poorly with frame pacing plugins.

The thing is that frame pacing should (at least currently) not be done with the help of Winit. Rather some API is needed to query timings which should then control when rendering happens, this would require not drawing in RedrawRequested, as it is throttled by VSync, then WaitUntil can be used to make sure the event loop is woken up at the right time.

In reality of course you want to do this in a different thread and not in the Winit event loop to begin with.

Keep in mind that I'm unfamiliar with how Bevy frame pacing plugins currently function.


In any case, I recommend against using Poll at all, it will simply use 100% of your CPU.
I know this is very popular in FPS games, e.g. 600 FPS in Valorant. The idea here being that you never miss a frame, but that should be done with frame pacing instead of just consuming unnecessary CPU cycles.

Unless you have a very specific use-case in mind, this is usually not what you want. That said, on Web Poll will simply always be awful unless browsers improve their performance or you are rendering on a different thread.

@nicopap nicopap force-pushed the fix-about-to-wait-redraw branch from 61d801b to 6ec7ac0 Compare January 6, 2024 15:26
@nicopap nicopap marked this pull request as ready for review January 6, 2024 15:28
@nicopap
Copy link
Contributor Author

nicopap commented Jan 6, 2024

The multiple_windows crash also occurs on main, so not blocking. The PR is ready for review.

@daxpedda thank you for taking the time to walk me through all this.

Yeah frame pacing is an external crate right now https://github.com/aevyrie/bevy_framepace. It uses the most naive approach of measuring roughly frame time and then sleep just before the end of the app.update() based on inferred frame time. Strongly agree the best approach to reducing latency is pacing, and strongly agree this needs special measurement, which currently bevy_framepace doesn't really do, but that's for the future.

@miketwenty1
Copy link

miketwenty1 commented Jan 6, 2024

Fwiw. I Tested this branch on both desktop and physical mobile device.
My mobile device runs less hot with this change. I also found no regression bugs or issues while testing in my game.

Edit: Only web targets were tested

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems sensible, it's had expert feedback, and this fixes our performance regression. Once we get a final look-over from someone else I'll merge this in.

I really really want to make this part of our code base more clean, but fix comes first.

Copy link
Contributor

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that I'm unfamiliar with Bevy, but LGTM!

Comment on lines +74 to +77
// TODO(clean): winit docs recommends calling pre_present_notify before this.
// though `present()` doesn't present the frame, it schedules it to be presented
// by wgpu.
// https://docs.rs/winit/0.29.9/wasm32-unknown-unknown/winit/window/struct.Window.html#method.pre_present_notify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Window::pre_present_notify() before SurfaceTexture::present() has worked fine for me so far.

Looking at the Vulkan and OpenGL implementations (because this is no-op on everything except Wayland), I can't see a queue, but I'm definitely no Wgpu expert. Would be nice to get clarification here from somebody knowledgeable about Wgpu.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 6, 2024
Merged via the queue into bevyengine:main with commit 79021c7 Jan 6, 2024
22 checks passed
@mockersf
Copy link
Member

mockersf commented Jan 7, 2024

This PR broke iOS and Android... Changes to the event loop should be tested on all supported platforms

@miketwenty1
Copy link

@mockersf native iOS and Android targets? or web targets?

@mockersf
Copy link
Member

mockersf commented Jan 7, 2024

Native! #11245 should fix it

github-merge-queue bot pushed a commit that referenced this pull request Jan 8, 2024
#11250)

# Objective

- After #11227, example showcase
timeouts
- `ReactiveLowPower` now can wait indefinitely depending on "platform
specifics"

## Solution

- Patch desktop mode in example showcase to use default mode which is
always `Continuous`
@nicopap nicopap mentioned this pull request Jan 9, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jan 9, 2024
…dow creation (#11245)

# Objective

- Since #11227, Bevy doesn't work on mobile anymore. Windows are not
created.

## Solution

- Create initial window on mobile after the initial `Resume` event.
macOS is included because it's excluded from the other initial window
creation and I didn't want it to feel alone. Also, it makes sense. this
is needed for Android

https://github.com/bevyengine/bevy/blob/cfcb6885e3b475a93ec0fe7e88023ac0f354bbbf/crates/bevy_winit/src/lib.rs#L152
- request redraw during plugin initialisation (needed for WebGPU)
- request redraw when receiving `AboutToWait` instead of at the end of
the event handler. request to redraw during a `RedrawRequested` event
are ignored on iOS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Windowing Platform-agnostic interface layer to run your app in C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-High This is particularly urgent, and deserves immediate attention
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible performance regression in WASM builds
6 participants