Add VAE to txt-to-speech Inference #32

digiphd · 2023-02-03T20:53:49Z

Hey hey!

So I am using some models that either have VAE baked in or require a separate VAE to be defined during inference like this:

model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)

when I either manually added the vae or used a model with a vae baked in for the MODEL_ID, I received the following error, for example with the model dreamlike-art/dreamlike-photoreal-2.0

'name': 'RuntimeError', 'message': 'Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same', 'stack': 'Traceback (most recent call last):\n  File "/api/app.py", line 382, in inference\n    images = pipeline(**model_inputs).images\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n    return func(*args, **kwargs)\n  File "/api/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 606, in __call__\n    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prompt_embeds).sample\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/api/diffusers/src/diffusers/models/unet_2d_condition.py", line 475, in forward\n    sample = self.conv_in(sample)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward\n    return self._conv_forward(input, self.weight, self.bias)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward\n    return F.conv2d(input, weight, bias, self.stride,\nRuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Line 382 in the inference function which looks like this:

images = pipeline(**model_inputs).images

Perhaps we need to add a .half() to the input somewhere, not sure where. though.

Any help would be greatly appreciated!

It's the last hurdle I am facing to be generating images.

IDEA:
It would be awesome if we could define an optional VAE when making API call like this:

model_inputs["callInputs"] = {
                "MODEL_ID": "runwayml/stable-diffusion-v1-5",
                "PIPELINE": "StableDiffusionPipeline",
                "SCHEDULER": self.scheduler,
                "VAE": "stabilityai/sd-vae-ft-mse"
            }

The text was updated successfully, but these errors were encountered:

gadicc · 2023-02-04T07:49:26Z

Hey, @digiphd! Thanks for getting this on my radar. I'll have a chance to take a look during this coming week.

As a preliminary comment, I like the idea of being able to switch the VAE at runtime, although there will be a lot of work involved to adapt how we currently cache models.

P.S. If you're impatient, in the meantime, I think you could probably:

Clone https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/fp16
Replace the vae directory with the contents from https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main
Upload that "new" model back to HuggingFace and build docker-diffusers-api with that (it's possible without uploading back to huggingface, but a bit more complicated).

Alternatively, with your current setup, it's possible that if you set MODEL_PRECISION="" and MODEL_REVISION="", you might get past that error by using full precision (but inference will be slower; nevertheless, maybe something useful in the interim).

Anyways, have a great weekend and we'll be in touch next week 😀

digiphd · 2023-02-04T08:50:48Z

Hey @gadicc great, thanks for your suggestions I will give them ago! You're a legend!

Another thing I was wondering, was if docker-diffusers-api text-to-image supports negative keywords?

I did put it as an argument and it seemed to negatively affect the output images.

gadicc · 2023-02-04T08:57:25Z

Yup! negative_prompt modelInput, as it seems you worked out.

The modelInput's are passed directly to the relevant diffusers' pipeline, so you can use whatever arguments are supported by that pipeline. I made this a little clearer in the README a few days ago with links to the common diffusers pipelines, as I admit it wasn't so obvious until then 😅

There's also a note there now about using the lpw_stable_diffusion pipeline which supports longer prompts and prompt weights.

Thanks for all the kind words! 🙌

gadicc · 2023-02-04T09:28:04Z

Hey @digiphd, I had a quick moment to try dreamlike-art/dreamlike-photoreal-2.0 and it works out the box for me, in both full and half precision. What version of docker-diffusers-api are you using?

These worked for me:

$ python test.py txt2img --call-arg MODEL_ID="dreamlike-art/dreamlike-photoreal-2.0" --call-arg MODEL_PRECISION=""
$ python test.py txt2img --call-arg MODEL_ID="dreamlike-art/dreamlike-photoreal-2.0" --call-arg MODEL_PRECISION="fp16"

I just tried in the default "runtime" config. If you have this issue specifically in the -build-download variant, let me know.

gadicc · 2023-02-04T09:42:12Z

Related: #26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VAE to txt-to-speech Inference #32

Add VAE to txt-to-speech Inference #32

digiphd commented Feb 3, 2023 •

edited

Loading

gadicc commented Feb 4, 2023

digiphd commented Feb 4, 2023

gadicc commented Feb 4, 2023

gadicc commented Feb 4, 2023

gadicc commented Feb 4, 2023

Add VAE to txt-to-speech Inference #32

Add VAE to txt-to-speech Inference #32

Comments

digiphd commented Feb 3, 2023 • edited Loading

gadicc commented Feb 4, 2023

digiphd commented Feb 4, 2023

gadicc commented Feb 4, 2023

gadicc commented Feb 4, 2023

gadicc commented Feb 4, 2023

digiphd commented Feb 3, 2023 •

edited

Loading