Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VAE to txt-to-speech Inference #32

Open
digiphd opened this issue Feb 3, 2023 · 5 comments
Open

Add VAE to txt-to-speech Inference #32

digiphd opened this issue Feb 3, 2023 · 5 comments

Comments

@digiphd
Copy link

digiphd commented Feb 3, 2023

Hey hey!

So I am using some models that either have VAE baked in or require a separate VAE to be defined during inference like this:

model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)

when I either manually added the vae or used a model with a vae baked in for the MODEL_ID, I received the following error, for example with the model dreamlike-art/dreamlike-photoreal-2.0

'name': 'RuntimeError', 'message': 'Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same', 'stack': 'Traceback (most recent call last):\n  File "/api/app.py", line 382, in inference\n    images = pipeline(**model_inputs).images\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n    return func(*args, **kwargs)\n  File "/api/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 606, in __call__\n    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prompt_embeds).sample\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/api/diffusers/src/diffusers/models/unet_2d_condition.py", line 475, in forward\n    sample = self.conv_in(sample)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward\n    return self._conv_forward(input, self.weight, self.bias)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward\n    return F.conv2d(input, weight, bias, self.stride,\nRuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Line 382 in the inference function which looks like this:

images = pipeline(**model_inputs).images

Perhaps we need to add a .half() to the input somewhere, not sure where. though.

Any help would be greatly appreciated!

It's the last hurdle I am facing to be generating images.

IDEA:
It would be awesome if we could define an optional VAE when making API call like this:

model_inputs["callInputs"] = {
                "MODEL_ID": "runwayml/stable-diffusion-v1-5",
                "PIPELINE": "StableDiffusionPipeline",
                "SCHEDULER": self.scheduler,
                "VAE": "stabilityai/sd-vae-ft-mse"
            }
@gadicc
Copy link
Collaborator

gadicc commented Feb 4, 2023

Hey, @digiphd! Thanks for getting this on my radar. I'll have a chance to take a look during this coming week.

As a preliminary comment, I like the idea of being able to switch the VAE at runtime, although there will be a lot of work involved to adapt how we currently cache models.

P.S. If you're impatient, in the meantime, I think you could probably:

  1. Clone https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/fp16
  2. Replace the vae directory with the contents from https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main
  3. Upload that "new" model back to HuggingFace and build docker-diffusers-api with that (it's possible without uploading back to huggingface, but a bit more complicated).

Alternatively, with your current setup, it's possible that if you set MODEL_PRECISION="" and MODEL_REVISION="", you might get past that error by using full precision (but inference will be slower; nevertheless, maybe something useful in the interim).

Anyways, have a great weekend and we'll be in touch next week 😀

@digiphd
Copy link
Author

digiphd commented Feb 4, 2023

Hey @gadicc great, thanks for your suggestions I will give them ago! You're a legend!

Another thing I was wondering, was if docker-diffusers-api text-to-image supports negative keywords?

I did put it as an argument and it seemed to negatively affect the output images.

@gadicc
Copy link
Collaborator

gadicc commented Feb 4, 2023

Yup! negative_prompt modelInput, as it seems you worked out.

The modelInput's are passed directly to the relevant diffusers' pipeline, so you can use whatever arguments are supported by that pipeline. I made this a little clearer in the README a few days ago with links to the common diffusers pipelines, as I admit it wasn't so obvious until then 😅

There's also a note there now about using the lpw_stable_diffusion pipeline which supports longer prompts and prompt weights.

Thanks for all the kind words! 🙌

@gadicc
Copy link
Collaborator

gadicc commented Feb 4, 2023

Hey @digiphd, I had a quick moment to try dreamlike-art/dreamlike-photoreal-2.0 and it works out the box for me, in both full and half precision. What version of docker-diffusers-api are you using?

These worked for me:

$ python test.py txt2img --call-arg MODEL_ID="dreamlike-art/dreamlike-photoreal-2.0" --call-arg MODEL_PRECISION=""
$ python test.py txt2img --call-arg MODEL_ID="dreamlike-art/dreamlike-photoreal-2.0" --call-arg MODEL_PRECISION="fp16"

I just tried in the default "runtime" config. If you have this issue specifically in the -build-download variant, let me know.

@gadicc
Copy link
Collaborator

gadicc commented Feb 4, 2023

Related: #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants