Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues about applying vae encode to channel-wise concat image condition #38

Open
Dawn-LX opened this issue Dec 8, 2024 · 0 comments
Open

Comments

@Dawn-LX
Copy link

Dawn-LX commented Dec 8, 2024

def tensor_to_vae_latent(t, vae):
    video_length = t.shape[1]

    t = rearrange(t, "b f c h w -> (b f) c h w")

    latents = vae.encode(t).latent_dist.sample()
    latents = rearrange(latents, "(b f) c h w -> b f c h w", f=video_length)
    latents = latents * vae.config.scaling_factor

    return latents

conditional_latents = tensor_to_vae_latent(conditional_pixel_values, vae)[:, 0, :, :, :]

conditional_latents = conditional_latents / vae.config.scaling_factor

NOTE here SVD used re-scale back, following InstructPix2Pix, channel_concat image do not apply vae's scaling_factor
But unlike InstructPix2Pix, SVD used vae.encode(x).latent_dist.sample() for channel_concat image
However, InstructPix2Pix used vae.encode(x).latent_dist.mode() for channel_concat image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant