-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OpenVino to increase speed #2
Comments
@deinferno sounds cool |
CPU performance is basically double over the standard PyTorch. SDNext has OpenVINO support out of the box: SDNext's OpenVINO support is based on the official OpenVINO Script's torch.compile backend: Using native OpenVINO instead of torch.compile backend will be better for this app tho. |
very cool indeed! could u please tell us how and when it is ready to use? can the model be converted to f16 when it is ready? for lower size |
Models will be cached as FP32 and they will be converted for your hardware at the first run. Using native OpenVINO will be better for this app since you don't have to convert from PyTorch to OpenVINO with native OpenVINO. |
thanks for the answer! if openvino can give the same or better speed as onnx for my intel cpu, i can't wait for it! thanks! |
Yesterday I tried to convert model to openvino, image generation is a bit blurry (using the LMS sampler worked). Full LCM pipeline conversion is not yet done. @deinferno Any updates? |
@deinferno @rupeshs i asked this in lcm repo yesterday but i ask it here too as i really like this gui too, so let us all have everything here as well if they can be run with openvino too that will be amazing! |
It's done, i uploaded weights and inference code https://huggingface.co/deinferno/LCM_Dreamshaper_v7-openvino |
@deinferno Thanks , now 21 seconds reduced to 9 to12 seconds for 512x512 image on core i7 (4 steps) |
@deinferno seems like memory usage is high compared to pytorch inference, |
@deinferno Added OpenVINO support |
Works fine on Linux too. Took 8.6 seconds at 512x512, 4 steps with my R7 5800X3D CPU & 3200 MHz CL18 RAM. Also replaced |
@Disty0 wow ,thanks for testing it |
Closest i could get without LCM on the GPU is 1 FPS. LCM boost the speed to 3 FPS. Here is a video of it running on a GPU: |
@Disty0 That's pretty fast |
it was around 11sec/it with 512x512 with my cpu not it is around 5secs pretty good speedup. |
I tried converting without timestep_cond and disabling it in pipeline too, but it seems to not be a cause for huge ram usage. I found out that if compiled shape .blob model file exists in locally downloaded model folder memory usage goes from 7 to 11 gigs. Can someone can test that with official stable diffusion openvino pipeline with reshape and compile model like in my example? |
For some reason OpenVino converted LCM model is no longer deterministic for me, i just can't get even closer to previous output with same random seed. Also it seems that guidance_scale itself doesn't do much even in official Latent Consistency Model Space |
@deinferno I tried with np random seed as per openvino docs, but producing not identical images but similar ; also added it in the master branch |
I updated inference code. It now produces same results on same seeds. |
@deinferno great, if possible could you please create a PR? |
@deinferno Thanks,that is cool |
@deinferno Added a comment in the PR ,could you please check |
@deinferno NVM, just merged thanks for this PR |
@deinferno seems like we have a problem with the latest openvino change, garbage output #36 (comment) |
@deinferno could u please work on a onnx version? onnx is as fast as openvino for cpu and i don't think it has this ram usage problems |
@Amin456789 You may want to watch this PR in optimum for ONNX version. OpenVino should only use 7.1 GB instead of 14.1 GB after #40 was merged. |
nice! thanks for the answer, can't wait for this updates |
@deinferno I tried Tiny Auto encoder for SD, got some speed improvement in the work diffuser workflow(25% speed boost), if we use it in OpenVINO speed we can probably increase speed . |
@rupeshs its amazing, can u please impelent this in normal model too? |
@Amin456789 yes implemented in the normal model,I will create a branch tomorrow |
nice, thank u! |
could u please add a dark mode in the future for the windows too? @rupeshs |
@Amin456789 yes |
WIP: Added tiny autoencoder for normal pipeline, @deinferno can you check the OpenVINO part? |
@rupeshs Big speedup from TAESD indeed, 4 images pipeline run now only takes 8.1 seconds instead of 12.5 with OpenVino converted TAESD, i will push converted weights and PR soon. |
@deinferno that's great,cheers. |
interesting, it is actually slower with ryzen 2200g , normally 13 seconds for 4 step or about 3.25 sec/it with openvino but if I enable tiny auto encoder it is now 14 seconds or about 3.5 sec/it. :) Maybe on faster cpu's it would be faster I don't know what is happening here. |
another model is out |
@deinferno I have added LCM-LoRA support but I'm not sure whether it is possible with OpenVINO, |
Where exactly did you change that, if I may ask? |
Hi, I have a tech question, I'm already using FASTSD-CPU but now I want to bypass CPU and use Intel ARC310 and ARC380 graphic cards with the same openvino config that you perfectly done in FASTSD-CPU; it is possible to make the same App with the option use GPU-1 Arc and share half of process in CPU and half of process in small 4GB 8GB ARC GPU's? Thanks in advance if you can make it! |
What line is safe to change? in wich file? |
It's possible to adapt pipeline and convert weights to openvino format with little hackaround.
Current missing feature is implementation of timestep_cond input in compiled unet, which breakes guidance making images dim and messy.It can be bypassed by implementing classic cond/uncond but lowers inference speed by 33%. (I didn't use it benchmark because of that)
For example on Xeon Gold with 48C/96T speed increases a lot making it able to generate 512x512 image every 4 second or 12 seconds for batch of 4.
I will post weights, ov pipeline and comparable benchmark soon.
The text was updated successfully, but these errors were encountered: