You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should create the compute pipelines asynchronously, by using the CPU shaders whilst the GPU pipelines are being created (except perhaps for fine).
This is especially important for first-run performance, when pipeline caches built in to drivers won't have been filled yet. On my Google Pixel 6, creating pipelines with a cold cache takes approximately 1.7 seconds. This does not give a good user experience for the first run of the app1.
This has been somewhat mitigated by #455, as prior to that this took more than 4 seconds.
This 1.7 seconds currently blocks app startup, but using the CPU shaders instead means renderer creation takes 140ms instead.
Note that this does have an impact on frame latency - my measurements suggest that each frame of Tiger takes 30ms when using the CPU shaders vs <10ms with the GPU shaders. So overall, this approach should be expected to save ~ $1700-140-20=1540$ milliseconds, i.e. about 1.5 seconds on time to first frame on first run. This is the vast majority of the current time to first frame.
This is also applicable to desktop use cases2, but is not the motivating example, because the startup time is shorter, even with a cold cache.
MESA_SHADER_CACHE_DISABLE=1 cargo run -p with_winit --release can be used with Mesa to test the time without caches - it takes ~200ms with 14 threads, versus ~5ms with the cache on my machine. ↩
The text was updated successfully, but these errors were encountered:
This is part of my investigation into startup time on Android (#gpu > Android Startup Time Investigation).
We should create the compute pipelines asynchronously, by using the CPU shaders whilst the GPU pipelines are being created (except perhaps for
fine
).This is especially important for first-run performance, when pipeline caches built in to drivers won't have been filled yet. On my Google Pixel 6, creating pipelines with a cold cache takes approximately 1.7 seconds. This does not give a good user experience for the first run of the app1.
This has been somewhat mitigated by #455, as prior to that this took more than 4 seconds.
This 1.7 seconds currently blocks app startup, but using the CPU shaders instead means renderer creation takes 140ms instead.$1700-140-20=1540$ milliseconds, i.e. about 1.5 seconds on time to first frame on first run. This is the vast majority of the current time to first frame.
Note that this does have an impact on frame latency - my measurements suggest that each frame of Tiger takes 30ms when using the CPU shaders vs <10ms with the GPU shaders. So overall, this approach should be expected to save ~
This is also applicable to desktop use cases2, but is not the motivating example, because the startup time is shorter, even with a cold cache.
Footnotes
Note that on current
main
there is no pipeline caching on Android. This is blocked on Pipeline cache API and implementation for Vulkan gfx-rs/wgpu#5319 ↩MESA_SHADER_CACHE_DISABLE=1 cargo run -p with_winit --release
can be used with Mesa to test the time without caches - it takes ~200ms with 14 threads, versus ~5ms with the cache on my machine. ↩The text was updated successfully, but these errors were encountered: