Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hpcai OpenSora v1.2 - 3D VAE inference #560

Merged
merged 45 commits into from
Jul 11, 2024

Conversation

SamitHuang
Copy link
Collaborator

@SamitHuang SamitHuang commented Jun 20, 2024

What does this PR do?

Fixes # (issue)

Adds # (feature)
opensora v1.2 vae:

  • Temporal VAE
  • 3D VAE consisted of spatial vae + temporal vae

Passed test on 910*

  • Video Reconstruction, PNSR 31
  • Inference integrated with stdit v2

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

image_latents = self.vae.encode(x)
image_latents = image_latents * self.scale_factor
# image_latents = ops.stop_gradient(self.vae.encode(x))
image_latents = ops.stop_gradient(self.vae.module.encode(x) * self.vae.scale_factor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scale factor is independent of type of the vae to use, so I think better to keep the parameter in difussion pipeline scope instead of vae scope

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once it is parsed to vae, it is as a member. it should be configurable but hpcai write it in the code and train it with this fixed value... which is not so proper for different vae training data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this: AttributeError: The 'VideoAutoencoderPipeline' object has no attribute 'module'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this: AttributeError: The 'VideoAutoencoderPipeline' object has no attribute 'scale_factor'.

@@ -71,40 +71,26 @@ def vae_decode(self, x: Tensor) -> Tensor:
Return:
y: (b H W 3), batch of images, normalized to [0, 1]
"""
b, c, h, w = x.shape

if self.micro_batch_size is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro_batch_size is removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's wrapped in VAE class

vae = VideoAutoencoderKL(
config=SD_CONFIG, ckpt_path=args.vae_checkpoint, micro_batch_size=args.vae_micro_batch_size
)
elif args.vae_dtype == 'OpenSoraVAE_V1_2"':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif args.vae_dtype == 'OpenSoraVAE_V1_2"':
elif args.vae_type == "OpenSoraVAE_V1_2":

ckpt_path: "models/opensora_v1.2_stage3.ckpt"
t5_model_dir: "models/t5-v1_1-xxl/"

vae_model_type: "OpenSoraVAE_V1_2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vae_model_type: "OpenSoraVAE_V1_2"
vae_type: OpenSoraVAE_V1_2

Copy link
Collaborator

@hadipash hadipash Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set temporal compression VAE_T_COMPRESS and input_size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return latent_size


class VideoAutoencoderPipelineConfig(PretrainedConfig):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's good to use PretrainedConfig with our project. It's better to move these parameters under __init__ in VideoAutoencoderPipeline.

@SamitHuang SamitHuang added this pull request to the merge queue Jul 11, 2024
Merged via the queue into mindspore-lab:master with commit 9c5e789 Jul 11, 2024
1 of 3 checks passed
@SamitHuang SamitHuang removed the request for review from zhanghuiyao July 11, 2024 09:30
yuedongli1 pushed a commit to yuedongli1/mindone that referenced this pull request Jul 25, 2024
* add vae 3d enc-dec

* update test

* dev save

* testing

* spatial vae test pass

* fix

* add vae param list

* fix name order

* add shape

* add shape

* order pnames

* ordered temporal pnames

* vae 3d recons ok

* update docs

* add test scripts

* add convert script

* adapt to 910b

* support ms2.3 5d GN

* rm test files

* fix format

* debug infer

* add sample t2v yaml

* fix i2v

* update comment

* fix format

* rm tmp test

* fix docs

* fix var name

* fix latent shape compute

* add info

* fix image enc/dec

* fix format

* adapt new vae in training

* fix dtype

* pad bf16 fixed by cast to fp16

* fix ops.pad bf16 with fp32 cast

* replace pad with concat

* replace pad_at_dim with concat for bf16
yuedongli1 pushed a commit to yuedongli1/mindone that referenced this pull request Aug 15, 2024
* add vae 3d enc-dec

* update test

* dev save

* testing

* spatial vae test pass

* fix

* add vae param list

* fix name order

* add shape

* add shape

* order pnames

* ordered temporal pnames

* vae 3d recons ok

* update docs

* add test scripts

* add convert script

* adapt to 910b

* support ms2.3 5d GN

* rm test files

* fix format

* debug infer

* add sample t2v yaml

* fix i2v

* update comment

* fix format

* rm tmp test

* fix docs

* fix var name

* fix latent shape compute

* add info

* fix image enc/dec

* fix format

* adapt new vae in training

* fix dtype

* pad bf16 fixed by cast to fp16

* fix ops.pad bf16 with fp32 cast

* replace pad with concat

* replace pad_at_dim with concat for bf16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants