Sfast optimization for T2I I2I and upscale pipelines #133
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implementation was straight forward from the already written code. But the testing took most of the time. Maybe its because my A5000 was slow or something, it would take forever to load and compile the model for sfast.
for T2I most of the models worked.
Best was ByteDance/SDXL_Lightning with total iteration time of 441s, 1st warmup was 386s and second was only 55s. Inference time of 5.10s/it for first image and it sped it up 4.65it/s for subsequent images.
Worst was SG161222/RealVisXL_V4.0_Lightning with total iteration time of unknows, 1st warmup was 1600s and second was was incomplete even after 55minutes. Inference time of unknown for first image and it sped it up unknown for subsequent images.
for I2I I couldnot get SDXL or SD_turbo to work. Neither did timbrooks/instruct-pix2pix.
for upscale the only model available was stabilityai/stable-diffusion-x4-upscaler but that didnt compile with sfast, even single iteration of compile would take forever. I left it for an hour but it only moved couple of steps. So I will need to make more tests to see if the issue is with models or hardware.
Conclusion: some models took insanely long to pre trace so I dont think that would be a good for anyone. Maybe if there was a way to precompile and save to memory so that you could instantly switch between precompiled models instead of compiling it each time you load the models.