Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Stable Diffusion Inpainting FP16 UNET outputs NANs #22983

Open
jdp8 opened this issue Dec 2, 2024 · 6 comments · Fixed by #23085
Open

[Web] Stable Diffusion Inpainting FP16 UNET outputs NANs #22983

jdp8 opened this issue Dec 2, 2024 · 6 comments · Fixed by #23085
Labels
api:Javascript issues related to the Javascript API ep:WebGPU ort-web webgpu provider platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@jdp8
Copy link

jdp8 commented Dec 2, 2024

Describe the issue

I converted stable-diffusion-inpainting and stable-diffusion-2-inpainting to FP16 ONNX format using both the optimum-cli export command and this script. The models work fine in Python ONNX Runtime but in ONNX Runtime Web, the UNET outputs NANs for some unknown reason as shown below:

Image

The code running the models in the browser was translated to JavaScript from the pipeline script and the ONNX pipeline script and I'm pretty sure that my code is correct but I could be wrong. The shapes are as expected as ORT Web does not complain about this.

Does anybody have any idea what could be causing these NANs in the UNET? Could this be an issue of the model conversion or my code? Any assistance with this will be greatly appreciated as I have tried pretty much all I can think of.

Additional Context

  • There is a Nearest Neighbor Interpolation resize done in this pipeline which I achieved using OpenCV.js like so:
const maskCV = cv.matFromArray(width, height, cv.CV_32FC1, maskCondition);
const interpolatedMask = new cv.Mat();
const dSize = new cv.Size(height / this.vaeScaleFactor, width / this.vaeScaleFactor);
cv.resize(maskCV, interpolatedMask, dSize, 0, 0, cv.INTER_NEAREST);

To reproduce

To quickly reproduce the issue I guess that the UNET of any of my converted models can be loaded in an Inference Session and an object of 3 random inputs can be passed as input to the Session. The object consists of the following entries:

{
sample: shape [2, 9, 64, 64] | type float32,
timestep: shape [1] | type float32,
encoder_hidden_states: shape [2, 77, 1024] | type float32
}

Some of the models I have converted are:

Urgency

Somewhat urgent.

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.0

Execution Provider

'webgpu' (WebGPU)

@jdp8 jdp8 added the platform:web issues related to ONNX Runtime web; typically submitted using template label Dec 2, 2024
@github-actions github-actions bot added api:Javascript issues related to the Javascript API ep:WebGPU ort-web webgpu provider labels Dec 2, 2024
@jdp8 jdp8 changed the title Stable Diffusion Inpainting FP16 UNET outputs NANs [Web] [Web] Stable Diffusion Inpainting FP16 UNET outputs NANs Dec 2, 2024
@fs-eire
Copy link
Contributor

fs-eire commented Dec 2, 2024

@jdp8 thank you for reporting the issue. could you please share the repro steps (including the JavaScript code)? A jsfiddle link would also be good.

@jdp8
Copy link
Author

jdp8 commented Dec 6, 2024

@fs-eire sorry for the delay. I made a simple jsfiddle that runs the Stable Diffusion Inpainting model for 1 step and prints the UNET output which is a Tensor filled with NANs. I left it up to that point as to not complicate the code more. The code was heavily inspired by the SD Turbo ORT Web example code.

Repro Steps

  • Convert a Stable Diffusion Inpainting model using either of the conversion scripts that I mentioned in the initial post. I have many converted inpainting models which can be found in my HuggingFace Repo in case you want to use them.
  • Translate the Stable Diffusion Inpainting code from Python to JavaScript (this is in the jsfiddle). The code can be taken either from here or here but I took more inspiration from the ONNX pipeline.

Other Info

  • There were challenges due to JavaScript not having certain features such as array broadcasting and a Nearest Neighbor Interpolation. The array broadcasting was implemented in the getMaskedImage() function and OpenCV.js was used for the Nearest Neighbor Interpolation.
  • The base image and mask image are already included in the code as base64 strings. The images and the example that I'm trying to run in the browser were taken from here (the first one).

Let me know if you have any questions or if I left something out. Thank you!

@fs-eire
Copy link
Contributor

fs-eire commented Dec 10, 2024

I am investigating this issue.

@fs-eire
Copy link
Contributor

fs-eire commented Dec 12, 2024

#23085 should have fixed the NaN issue, but not sure if there are other issues that blocks SD running.

@jdp8 Please allow one or two days for the pipeline to publish a new dev package and try it again.

@jdp8
Copy link
Author

jdp8 commented Dec 13, 2024

@fs-eire Thank you! I'll try it tomorrow and let you know.

@jdp8
Copy link
Author

jdp8 commented Dec 17, 2024

@fs-eire Sorry for the late response. Just tried it and the UNET is no longer outputting NaNs. Thank you so much!
I still have some NaNs appearing after the UNET is called (more or less happens after 8 steps) but that's something that I'll look into that's probably an error in my code.

Thank you once again!

guschmue pushed a commit that referenced this issue Dec 20, 2024
### Description
<!-- Describe your changes. -->

Fix a bug caused by potential out-of-bound reads of `W` in the
Conv2DMatMul shader.

### Motivation and Context

Fixes #22983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Javascript issues related to the Javascript API ep:WebGPU ort-web webgpu provider platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants