Use OpenVino to increase speed #2

deinferno · 2023-10-21T03:05:31Z

It's possible to adapt pipeline and convert weights to openvino format with little hackaround.

Current missing feature is implementation of timestep_cond input in compiled unet, which breakes guidance making images dim and messy.It can be bypassed by implementing classic cond/uncond but lowers inference speed by 33%. (I didn't use it benchmark because of that)

For example on Xeon Gold with 48C/96T speed increases a lot making it able to generate 512x512 image every 4 second or 12 seconds for batch of 4.

I will post weights, ov pipeline and comparable benchmark soon.

rupeshs · 2023-10-21T04:50:39Z

@deinferno sounds cool

Disty0 · 2023-10-21T11:48:54Z

CPU performance is basically double over the standard PyTorch.

SDNext has OpenVINO support out of the box:
https://github.com/vladmandic/automatic/wiki/OpenVINO

SDNext's OpenVINO support is based on the official OpenVINO Script's torch.compile backend:
https://github.com/openvinotoolkit/stable-diffusion-webui/blob/master/scripts/openvino_accelerate.py#L117C10-L117C10

Using native OpenVINO instead of torch.compile backend will be better for this app tho.

Amin456789 · 2023-10-22T10:56:13Z

very cool indeed! could u please tell us how and when it is ready to use? can the model be converted to f16 when it is ready? for lower size

Disty0 · 2023-10-22T11:57:23Z

very cool indeed! could u please tell us how and when it is ready to use? can the model be converted to f16 when it is ready? for lower size

Models will be cached as FP32 and they will be converted for your hardware at the first run.
torch.complie backend will handle everything for you, you just need to trigger recompile in the code again if there is a parameter change.

Using native OpenVINO will be better for this app since you don't have to convert from PyTorch to OpenVINO with native OpenVINO.

Amin456789 · 2023-10-22T12:02:01Z

thanks for the answer! if openvino can give the same or better speed as onnx for my intel cpu, i can't wait for it!
i have no idea how to use those parameters but i will ask u guys when it is ready to use

thanks!

rupeshs · 2023-10-22T12:03:17Z

Yesterday I tried to convert model to openvino, image generation is a bit blurry (using the LMS sampler worked). Full LCM pipeline conversion is not yet done. @deinferno Any updates?

Amin456789 · 2023-10-22T12:37:52Z

@deinferno @rupeshs i asked this in lcm repo yesterday but i ask it here too as i really like this gui too, so let us all have everything here as well
will u guys try to implement other features of sd to this in the future? such as img2img, inpaniting and AnimateDiff

if they can be run with openvino too that will be amazing!

deinferno · 2023-10-22T12:54:03Z

It's done, i uploaded weights and inference code https://huggingface.co/deinferno/LCM_Dreamshaper_v7-openvino

rupeshs · 2023-10-22T13:15:53Z

@deinferno Thanks , now 21 seconds reduced to 9 to12 seconds for 512x512 image on core i7 (4 steps)

rupeshs · 2023-10-22T15:57:49Z

@deinferno seems like memory usage is high compared to pytorch inference,
512 x 512 - 9GB
768 x 768 - 12GB

rupeshs · 2023-10-22T18:08:18Z

@deinferno Added OpenVINO support
https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.3

Disty0 · 2023-10-22T18:31:21Z

@deinferno Added OpenVINO support https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.3

Works fine on Linux too. Took 8.6 seconds at 512x512, 4 steps with my R7 5800X3D CPU & 3200 MHz CL18 RAM.

Also replaced device: str = "CPU", line to device: str = "GPU", and an image with the same settings took 0.36 seconds on my Intel ARC A770.

rupeshs · 2023-10-22T18:33:43Z

@Disty0 wow ,thanks for testing it

Disty0 · 2023-10-22T18:46:00Z

Closest i could get without LCM on the GPU is 1 FPS. LCM boost the speed to 3 FPS.

Here is a video of it running on a GPU:

https://www.youtube.com/watch?v=-zso94H10hA

rupeshs · 2023-10-22T18:51:49Z

@Disty0 That's pretty fast

patientx · 2023-10-22T19:02:50Z

it was around 11sec/it with 512x512 with my cpu not it is around 5secs pretty good speedup.

deinferno · 2023-10-22T19:22:48Z

@deinferno seems like memory usage is high compared to pytorch inference, 512 x 512 - 9GB 768 x 768 - 12GB

I tried converting without timestep_cond and disabling it in pipeline too, but it seems to not be a cause for huge ram usage. I found out that if compiled shape .blob model file exists in locally downloaded model folder memory usage goes from 7 to 11 gigs. Can someone can test that with official stable diffusion openvino pipeline with reshape and compile model like in my example?

deinferno · 2023-10-22T19:26:48Z

For some reason OpenVino converted LCM model is no longer deterministic for me, i just can't get even closer to previous output with same random seed.

Also it seems that guidance_scale itself doesn't do much even in official Latent Consistency Model Space

rupeshs · 2023-10-25T03:00:00Z

@deinferno I tried with np random seed as per openvino docs, but producing not identical images but similar ; also added it in the master branch
np.random.seed(seed)

deinferno · 2023-10-26T08:27:49Z

I updated inference code. It now produces same results on same seeds.

rupeshs · 2023-10-26T08:29:23Z

@deinferno great, if possible could you please create a PR?

deinferno · 2023-10-26T09:55:41Z

@rupeshs I opened PR.

Also HF Space is up. It runs on cpu-basic and generates one image each ~22.5 seconds, pretty impressive for 2 cores vCPU.

rupeshs · 2023-10-26T09:56:58Z

@deinferno Thanks,that is cool

rupeshs · 2023-10-26T12:10:02Z

@deinferno Added a comment in the PR ,could you please check

deinferno · 2023-10-26T14:51:58Z

@rupeshs That's odd, i can't find any comment or code review in #35 , i didn't receive notifications either 🤔

rupeshs · 2023-10-26T15:17:38Z

@deinferno NVM, just merged thanks for this PR

rupeshs · 2023-10-27T02:31:17Z

@deinferno seems like we have a problem with the latest openvino change, garbage output #36 (comment)

Amin456789 · 2023-10-28T07:22:09Z

@deinferno could u please work on a onnx version? onnx is as fast as openvino for cpu and i don't think it has this ram usage problems

deinferno · 2023-10-28T17:26:01Z

@Amin456789 You may want to watch this PR in optimum for ONNX version.

OpenVino should only use 7.1 GB instead of 14.1 GB after #40 was merged.
And from your problem in #36 it looks like your system is swapping horribly that's why smaller ONNX int8 model was a lot faster for you.

Amin456789 · 2023-10-28T17:30:08Z

nice! thanks for the answer, can't wait for this updates

rupeshs · 2023-10-29T04:02:43Z

@deinferno https://www.linkedin.com/posts/rupesh-sreeraman-2790ba31_stablediffusion-ai-lcm-activity-7123867924334878720-Cwov?utm_source=share&utm_medium=member_desktop

rupeshs · 2023-11-02T16:50:01Z

@deinferno I tried Tiny Auto encoder for SD, got some speed improvement in the work diffuser workflow(25% speed boost), if we use it in OpenVINO speed we can probably increase speed .
https://huggingface.co/docs/diffusers/main/en/api/models/autoencoder_tiny

Amin456789 · 2023-11-02T17:21:57Z

@rupeshs its amazing, can u please impelent this in normal model too?

rupeshs · 2023-11-02T17:24:04Z

@Amin456789 yes implemented in the normal model,I will create a branch tomorrow

Amin456789 · 2023-11-02T17:24:50Z

nice, thank u!

Amin456789 · 2023-11-02T18:06:45Z

could u please add a dark mode in the future for the windows too? @rupeshs

rupeshs · 2023-11-03T16:46:35Z

@Amin456789 yes

rupeshs · 2023-11-03T16:49:17Z

WIP: Added tiny autoencoder for normal pipeline, @deinferno can you check the OpenVINO part?
https://github.com/rupeshs/fastsdcpu/tree/add-tae-sd-support

deinferno · 2023-11-04T16:08:10Z

@rupeshs Big speedup from TAESD indeed, 4 images pipeline run now only takes 8.1 seconds instead of 12.5 with OpenVino converted TAESD, i will push converted weights and PR soon.

rupeshs · 2023-11-04T16:11:43Z

@deinferno that's great,cheers.

rupeshs · 2023-11-05T17:11:54Z

@deinferno Released beta 9 https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.9

patientx · 2023-11-05T18:28:14Z

interesting, it is actually slower with ryzen 2200g , normally 13 seconds for 4 step or about 3.25 sec/it with openvino but if I enable tiny auto encoder it is now 14 seconds or about 3.5 sec/it. :) Maybe on faster cpu's it would be faster I don't know what is happening here.

Amin456789 · 2023-11-08T07:30:06Z

another model is out
https://huggingface.co/furusu/LCM-Acertainty

rupeshs · 2023-11-11T16:46:35Z

@deinferno I have added LCM-LoRA support but I'm not sure whether it is possible with OpenVINO,
https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.12

onlyreportingissues · 2023-11-27T22:54:33Z

@deinferno Added OpenVINO support https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.3

Works fine on Linux too. Took 8.6 seconds at 512x512, 4 steps with my R7 5800X3D CPU & 3200 MHz CL18 RAM.

Also replaced device: str = "CPU", line to device: str = "GPU", and an image with the same settings took 0.36 seconds on my Intel ARC A770.

Where exactly did you change that, if I may ask?

ExperimentDiffusion · 2024-03-10T15:27:24Z

Hi, I have a tech question, I'm already using FASTSD-CPU but now I want to bypass CPU and use Intel ARC310 and ARC380 graphic cards with the same openvino config that you perfectly done in FASTSD-CPU; it is possible to make the same App with the option use GPU-1 Arc and share half of process in CPU and half of process in small 4GB 8GB ARC GPU's? Thanks in advance if you can make it!

ExperimentDiffusion · 2024-03-10T15:39:16Z

device: str = "GPU",

What line is safe to change? in wich file?
it is possible to share CPU/GPU process with a creation of a sliding bar of the user's desired cpu/gpu usage percentage?

deinferno closed this as completed Oct 22, 2023

deinferno reopened this Oct 22, 2023

deinferno mentioned this issue Nov 4, 2023

Add tiny auto encoder for OpenVino pipeline #52

Merged

rupeshs added the enhancement New feature or request label Nov 12, 2023

rupeshs closed this as completed Nov 12, 2023

Use OpenVino to increase speed #2

Use OpenVino to increase speed #2

Comments

deinferno commented Oct 21, 2023 • edited Loading

rupeshs commented Oct 21, 2023

Disty0 commented Oct 21, 2023

Amin456789 commented Oct 22, 2023 • edited Loading

Disty0 commented Oct 22, 2023

Amin456789 commented Oct 22, 2023

rupeshs commented Oct 22, 2023 • edited Loading

Amin456789 commented Oct 22, 2023

deinferno commented Oct 22, 2023 • edited Loading

rupeshs commented Oct 22, 2023 • edited Loading

rupeshs commented Oct 22, 2023

rupeshs commented Oct 22, 2023

Disty0 commented Oct 22, 2023 • edited Loading

rupeshs commented Oct 22, 2023

Disty0 commented Oct 22, 2023

rupeshs commented Oct 22, 2023

patientx commented Oct 22, 2023

deinferno commented Oct 22, 2023

deinferno commented Oct 22, 2023

rupeshs commented Oct 25, 2023 • edited Loading

deinferno commented Oct 26, 2023

rupeshs commented Oct 26, 2023

deinferno commented Oct 26, 2023

rupeshs commented Oct 26, 2023

rupeshs commented Oct 26, 2023

deinferno commented Oct 26, 2023

rupeshs commented Oct 26, 2023

rupeshs commented Oct 27, 2023

Amin456789 commented Oct 28, 2023

deinferno commented Oct 28, 2023 • edited Loading

Amin456789 commented Oct 28, 2023

rupeshs commented Oct 29, 2023

rupeshs commented Nov 2, 2023 • edited Loading

Amin456789 commented Nov 2, 2023

rupeshs commented Nov 2, 2023

Amin456789 commented Nov 2, 2023

Amin456789 commented Nov 2, 2023

rupeshs commented Nov 3, 2023

rupeshs commented Nov 3, 2023

deinferno commented Nov 4, 2023 • edited Loading

rupeshs commented Nov 4, 2023 • edited Loading

rupeshs commented Nov 5, 2023

patientx commented Nov 5, 2023

Amin456789 commented Nov 8, 2023

rupeshs commented Nov 11, 2023

onlyreportingissues commented Nov 27, 2023

ExperimentDiffusion commented Mar 10, 2024

ExperimentDiffusion commented Mar 10, 2024

deinferno commented Oct 21, 2023 •

edited

Loading

Amin456789 commented Oct 22, 2023 •

edited

Loading

rupeshs commented Oct 22, 2023 •

edited

Loading

deinferno commented Oct 22, 2023 •

edited

Loading

rupeshs commented Oct 22, 2023 •

edited

Loading

Disty0 commented Oct 22, 2023 •

edited

Loading

rupeshs commented Oct 25, 2023 •

edited

Loading

deinferno commented Oct 28, 2023 •

edited

Loading

rupeshs commented Nov 2, 2023 •

edited

Loading

deinferno commented Nov 4, 2023 •

edited

Loading

rupeshs commented Nov 4, 2023 •

edited

Loading