How to profile my super slow webNN implementation (sd-turbo img2img) #40

eyaler · 2024-09-23T17:36:52Z

I am trying to make a webNN example for sd-turbo image-to-image: https://github.com/eyaler/webnn-developer-preview/blob/main/demos/sd-turbo/index.js
I used the vae encoder from: https://huggingface.co/schmuell/sd-turbo-ort-web/ without any changes to the model. I also tried other ones, but this worked for me where others did not.
I probably didn't do the latents sampling as intended, but it is working. You can try it here: https://eyaler.github.io/webnn-developer-preview/demos/sd-turbo/ you need to tick the image-to-image checkbox, and it uses a default input image I provided (sorry). My main issue is the encoder takes 100x-300x the time compared to the original flow without the encoder (40-120 sec compare to 400 ms). You can compare by unticking the image-to-image checkbox. This suggests to me that something is very wrong with the way I hooked up the model, or that I missed some basic adjustment steps. I would be grateful for any insights on potentially obvious reasons for the large discrepancy and how to approach debugging this.

ibelem · 2024-09-29T00:54:04Z

@eyaler Great work and idea for image-to-image usage!

The inputs of the vae encoder you are using:

sample / name: sample
tensor: float16[batch_size,num_channels,height,width]

But your code of override is freeDimensionOverrides: { batch: 1, channels: 3, height: 512, width: 512 },
in https://github.com/eyaler/webnn-developer-preview/blob/main/demos/sd-turbo/index.js#L316

Please try freeDimensionOverrides: { batch_size: 1, num_channels: 3, height: 512, width: 512 }, at first?

https://huggingface.co/schmuell/sd-turbo-ort-web/blob/main/vae_encoder/config.json

CC @Honry

eyaler · 2024-09-29T16:57:44Z

@ibelem Oh Wow! fixing the argument names makes it 100x time faster! Thanks!!

However, now that the optimization is kicking in I have saturation issues that I guess are related to float16 casting. My next steps:

While the large performance hit makes it clear that optimizations were not working - I am not sure why the free dimensions would be connected with casting issues. Maybe everything stays on the CPU? I played with graphOptimizationLevel and indeed it seems that if i don't completely fix the free dimensions - that is equivalent to disabling all graph optimizations - which surprised me. Indeed there are use cases where dimensions are not fixed. Are these cases not ready for WebNN on GPU?
I will try the instance normalization cast fix mentioned in another issue and if it works - will try to make a script, as it seems to be quite a verbose and non-trivial process to do.
I plan to open an issue on onnx-runtime to suggest that wrong arguments names in free dimensions override will raise an error or at least a warning. I never want my models to run 100x slower due to a typo or wrong name, and I could not find a warning.
Also, initial tests show that the fixed 1.19.0-dev.20240804-ee2fe87e2d onnx-runtime version of the original demo is significantly faster than both 1.19.2stable and 1.20.0-dev.20240928-1bda91fc57. Is this expected? if not I will follow up with an onnx-runtime issue

ibelem · 2024-09-30T05:36:02Z

@eyaler Great to know the perf improved!

WebNN EP needs specify fixed integer values via freeDimensionOverrides for all the symbolic dimensions (https://onnxruntime.ai/docs/tutorials/web/env-flags-and-session-options.html#freedimensionoverrides), otherwise the optimizations will not be applied.

I plan to open an issue on onnx-runtime to suggest that wrong arguments names in free dimensions override will raise an error or at least a warning.

Please file a bug to microsoft/onnxruntime :)

Initial tests show that the fixed 1.19.0-dev.20240804-ee2fe87e2d onnx-runtime version of the original demo is significantly faster than both 1.19.2stable and 1.20.0-dev.20240928-1bda91fc57.

Please use single model to run the tests and provide detailed performance data among these ORT dists, then we can check what happened in newer ORT versions. Thanks a lot!

eyaler · 2024-10-03T12:07:07Z

@ibelem Thanks!

For throwing an error on bad names and emphasizing in the docs the need to fix all dimensions, i opened microsoft/onnxruntime#22300

My saturation issue has been solved by fixing the VAE encoder instance normalization as discussed, and i put the fixed model here: vae encoder. I made a helper script based on onnx2text here: https://github.com/eyaler/webnn-developer-preview/blob/main/demos/sd-turbo/fix_instance_norm.py - perhaps not generic enough, but may help give people the direction needed

I am actually seeing ~10x run time inconsistencies even with repeat inference with the same library version. Specifically, I see the VAE encoder or UNET getting slower after a hitting the generate button a few times in my web demo. I will investigate further.

eyaler mentioned this issue Oct 3, 2024

[Web] WebNN optimizations silently failing when using wrong argument names in freeDimensionOverrides microsoft/onnxruntime#22300

Open

eyaler closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to profile my super slow webNN implementation (sd-turbo img2img) #40

How to profile my super slow webNN implementation (sd-turbo img2img) #40

eyaler commented Sep 23, 2024

ibelem commented Sep 29, 2024

eyaler commented Sep 29, 2024

ibelem commented Sep 30, 2024 •

edited

Loading

eyaler commented Oct 3, 2024

How to profile my super slow webNN implementation (sd-turbo img2img) #40

How to profile my super slow webNN implementation (sd-turbo img2img) #40

Comments

eyaler commented Sep 23, 2024

ibelem commented Sep 29, 2024

eyaler commented Sep 29, 2024

ibelem commented Sep 30, 2024 • edited Loading

eyaler commented Oct 3, 2024

ibelem commented Sep 30, 2024 •

edited

Loading