-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to profile my super slow webNN implementation (sd-turbo img2img) #40
Comments
@eyaler Great work and idea for image-to-image usage! The inputs of the vae encoder you are using:
But your code of override is Please try https://huggingface.co/schmuell/sd-turbo-ort-web/blob/main/vae_encoder/config.json CC @Honry |
@ibelem Oh Wow! fixing the argument names makes it 100x time faster! Thanks!! However, now that the optimization is kicking in I have saturation issues that I guess are related to float16 casting. My next steps:
|
@eyaler Great to know the perf improved! WebNN EP needs specify fixed integer values via freeDimensionOverrides for all the symbolic dimensions (https://onnxruntime.ai/docs/tutorials/web/env-flags-and-session-options.html#freedimensionoverrides), otherwise the optimizations will not be applied.
Please file a bug to microsoft/onnxruntime :)
Please use single model to run the tests and provide detailed performance data among these ORT dists, then we can check what happened in newer ORT versions. Thanks a lot! |
@ibelem Thanks! For throwing an error on bad names and emphasizing in the docs the need to fix all dimensions, i opened microsoft/onnxruntime#22300 My saturation issue has been solved by fixing the VAE encoder instance normalization as discussed, and i put the fixed model here: vae encoder. I made a helper script based on onnx2text here: https://github.com/eyaler/webnn-developer-preview/blob/main/demos/sd-turbo/fix_instance_norm.py - perhaps not generic enough, but may help give people the direction needed I am actually seeing ~10x run time inconsistencies even with repeat inference with the same library version. Specifically, I see the VAE encoder or UNET getting slower after a hitting the generate button a few times in my web demo. I will investigate further. |
I am trying to make a webNN example for sd-turbo image-to-image: https://github.com/eyaler/webnn-developer-preview/blob/main/demos/sd-turbo/index.js
I used the vae encoder from: https://huggingface.co/schmuell/sd-turbo-ort-web/ without any changes to the model. I also tried other ones, but this worked for me where others did not.
I probably didn't do the latents sampling as intended, but it is working. You can try it here: https://eyaler.github.io/webnn-developer-preview/demos/sd-turbo/ you need to tick the image-to-image checkbox, and it uses a default input image I provided (sorry). My main issue is the encoder takes 100x-300x the time compared to the original flow without the encoder (40-120 sec compare to 400 ms). You can compare by unticking the image-to-image checkbox. This suggests to me that something is very wrong with the way I hooked up the model, or that I missed some basic adjustment steps. I would be grateful for any insights on potentially obvious reasons for the large discrepancy and how to approach debugging this.
The text was updated successfully, but these errors were encountered: