Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Delay in Image Generation with Flux Schnell on H100 #24

Open
uayodev opened this issue Oct 8, 2024 · 6 comments
Open

Initial Delay in Image Generation with Flux Schnell on H100 #24

uayodev opened this issue Oct 8, 2024 · 6 comments

Comments

@uayodev
Copy link

uayodev commented Oct 8, 2024

Thank you very much for your incredible work @aredden!

I wanted to ask you about something I've noticed when using Flux Schnell with an H100. (when using compile_extras and compile_blocks) After running the three warmups of Flux Schnell, the first image I generate takes about 45 seconds to start the first iteration, but the subsequent images generate quickly. Is this normal? Is there any way to avoid this initial delay?

I appreciate your help in advance.

@aredden
Copy link
Owner

aredden commented Oct 9, 2024

The slowdown is due to the torch.compile compilation, it should speed up after that, but the initial generation may take a while, and also may take a while for each new requested image shape. The initial slowdown is much more reasonable with torch nightly, or just torch > 2.4.x, since I believe they made it quite a bit faster, or at least it is faster on my machine. I barely notice compilation speed anymore, though I have a beefy computer so there is that.

@uayodev
Copy link
Author

uayodev commented Oct 11, 2024

Thanks so much for your reply! I really appreciate it.

@Muawizodux
Copy link

I have tested this slowdown on h100 and rtx4090. The slowdown is around 1 minute for just torch and for torch nightly its around 3-7 seconds

@aredden
Copy link
Owner

aredden commented Oct 17, 2024

Yeah- so essentially using nightly is significantly better.

@lenvoMaster
Copy link

I'm still experiencing a slowdown with the initial compilation on a H100 with torch nightly builds (2.6.0.dev20240918+cu124)
Based on the previous comments here ... that should not happen right.
Any thoughts on why this can happen?

@aredden
Copy link
Owner

aredden commented Dec 3, 2024

I think it depends. Sometimes compilation will be more costly than others depending on torch version. I think at the time, nightly was 2.5.0 or 2.5.1, I'm not sure. So, it could be that you may only need one of those two for fastest compile time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants