-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed the use of undenoised latent from the DPM++ scheduler #7
base: main
Are you sure you want to change the base?
Conversation
czkoko
commented
Mar 29, 2024
- The latent returned by the step function in DPM++ scheduler is not denoised. It seems to be the prediction of the next time step.
- I don't know much about it. I don't know if this fix is right, but it does solve the picture quality problem.
- The following is the picture quality comparison of DPM++ 2M Karras before and after the fixed.
Solved the picture quality problem
LGTM Now I see why I have no issue in my app you described a while ago, I DO exactly the same in my own implementation. |
Yes, this problem seems to have existed since Guernika came out. I took a lot of detours before I finally found out that the problem was here. |
LGTM Great bug fix! I just checked the hugging face Diffusion application and the denoised latent is derived the same way. again, great find. |
@czkoko thanks for the PR, I'm not being able to reproduce this with either SD1.5 or SDXL, I don't see any real difference in the outputs, maybe you could clarify how you generated those results, also if this is a fix for one specific scheduler I think it would make sense to implement it in Schedulers, it feels really hacky having it here. @SpiraMira could you explain what you meant with "the hugging face diffusion application does this"? can you share where? I rechecked DPMSolverMultistepScheduler and I don't see it. |
Hi @GuiyeC - by that I mean the ml-stable-diffusion demo app @ https://github.com/huggingface/swift-coreml-diffusers (written mostly by folks at hugging face). The issue is at the pipeline level not the scheduler itself. Below is an excerpt of their pipeline denoising loop (in StableDiffusionPipeline:generateImages()). Their final decoded image(s) source is : The last time I checked the DPMSolverMultistep scheduler implementations were the same. Do you think there’s an issue there? Note: all this said, I also am straining to see a marked difference between the two different approaches. @czkoko more details on the model, pipeline, scheduler and prompts used would be helpful.
|
@SpiraMira I still don't think this is an issue in the pipeline, they are using Apple's ml-stable-diffusion, and they may just be using the same hack to fix this, but I think the scheduler should be returning the correct output, specially if this doesn't happen with other schedulers. |
The following is 15 images of building a simple command-line program to decode each step. Through GuernikaKit 1.6.1, this PR is not applied. You can see that the final image adds a lot of unclean noise than step 14. DPM++ 2M will be more obvious. |
@czkoko I'm not saying there is no bug, but still I think this is a bug with the scheduler, not with the pipeline, the scheduler should return the correct output, if it's the last step then it should return the latent ready for decoding. Are you seeing this on any other schedulers? |
I don't know much about it, but compared with other schedulers, it seems that before the |
This seems to be the same issue. |
@GuiyeC public func step(
// ......
if lowerOrderStepped < solverOrder {
lowerOrderStepped += 1
}
if stepIndex == timeSteps.count - 1 {
return convertModelOutput(modelOutput: output, timestep: t, sample: prevSample)
}
return prevSample
} |
You could return public func step(
output: MLShapedArray<Float32>,
timeStep t: Double,
sample: MLShapedArray<Float32>,
generator: RandomGenerator
) -> MLShapedArray<Float32> {
...
let modelOutput = convertModelOutput(modelOutput: output, timestep: timeStep, sample: sample)
if modelOutputs.count == solverOrder { modelOutputs.removeFirst() }
modelOutputs.append(modelOutput)
if stepIndex == timeSteps.count - 1 {
return modelOutput
}
let prevSample: MLShapedArray<Float32>
...
return prevSample
} I still would like to see how they've solved it in other implementations, but maybe this is good enough for a temporary fix. |
@GuiyeC and @czkoko - all the non-solver schedulers will return the same denoised latent so modelOutput[0] == latent. I just double checked in the debugger (my own Guernika based app), so the proposed fix will have no effect (bug or not). I dug deeper into the GuernikaKit:Schedulers:DPMSolverMultistepScheduler code (vs apple's version) and noticed a difference in the secondOrderUpdate function. The last weightedSum parameters are different. The test case uses 15 steps so (I believe) it triggers the secondOrderUpdate call. Could this be an issue ? Also, which version of DPMSolverMultistepScheduler is the correct one ? |
Correction: my bad, it looks like the GuernikaKit Schedulers version was just refactored a bit, so functionally the same. |
Liuliu's draw things has just been open source. Maybe you can find some solutions from his code, and his DPM++ SDE has no problem with SDXL, but it takes double the time. |
@czkoko - is DPM++ 2M or SDE the problem child? I thought it was 2M Karras. Also, which XL model are you testing with? Would love to be able to reproduce your images on my side... |
|
@czkoko - So I played around with updating our current DPMSolverMulti along the lines of this hugging face patch for a similar problem: https://github.com/huggingface/diffusers/pull/6477/files#diff-517cce3913a4b16e1d17a0b945a920e400aa5553471df6cd85f71fc8f079b4b4 with some qualitative success with the 2M scheduler. The most significant piece of their patch affects the step function's lowerOrderFinal evaluation:
my version:
by setting my useFinalTraingScheduleSigma flag off, I achieve the same effect. Here are some shots with the 2M scheduler:
The rest of the patch adds either the final training schedule sigma or 0 during initialization. I’ve implemented it but I’m still not sure of its overall affect on our schedulers. Looks like the results are similar to the proposed hack, but this may be backed by more solid science and research. (of course assuming hugging face didn’t just hack this too since it made it into their baseline). I’m not an expert. Would love to hear from @GuiyeC about this. (I can submit some code for review if you like) |