Add hpcai OpenSora v1.2 - 3D VAE inference #560

SamitHuang · 2024-06-20T13:49:34Z

What does this PR do?

Fixes # (issue)

Adds # (feature)
opensora v1.2 vae:

Temporal VAE
3D VAE consisted of spatial vae + temporal vae

Passed test on 910*

Video Reconstruction, PNSR 31
Inference integrated with stdit v2

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

…o cai_os1.2

zhtmike · 2024-06-21T01:54:50Z

examples/opensora_hpcai/opensora/pipelines/infer_pipeline.py

-        image_latents = self.vae.encode(x)
-        image_latents = image_latents * self.scale_factor
+        # image_latents = ops.stop_gradient(self.vae.encode(x))
+        image_latents = ops.stop_gradient(self.vae.module.encode(x) * self.vae.scale_factor)


the scale factor is independent of type of the vae to use, so I think better to keep the parameter in difussion pipeline scope instead of vae scope

once it is parsed to vae, it is as a member. it should be configurable but hpcai write it in the code and train it with this fixed value... which is not so proper for different vae training data.

I get this: AttributeError: The 'VideoAutoencoderPipeline' object has no attribute 'module'.

And this: AttributeError: The 'VideoAutoencoderPipeline' object has no attribute 'scale_factor'.

zhtmike · 2024-06-21T01:55:33Z

examples/opensora_hpcai/opensora/pipelines/infer_pipeline.py

@@ -71,40 +71,26 @@ def vae_decode(self, x: Tensor) -> Tensor:
        Return:
            y: (b H W 3), batch of images, normalized to [0, 1]
        """
-        b, c, h, w = x.shape
-
-        if self.micro_batch_size is None:


micro_batch_size is removed?

no, it's wrapped in VAE class

hadipash · 2024-06-21T02:52:11Z

examples/opensora_hpcai/scripts/inference.py

+            vae = VideoAutoencoderKL(
+                config=SD_CONFIG, ckpt_path=args.vae_checkpoint, micro_batch_size=args.vae_micro_batch_size
+            )
+        elif args.vae_dtype == 'OpenSoraVAE_V1_2"':


Suggested change

elif args.vae_dtype == 'OpenSoraVAE_V1_2"':

elif args.vae_type == "OpenSoraVAE_V1_2":

hadipash · 2024-06-21T02:52:54Z

examples/opensora_hpcai/configs/opensora-v1-2/inference/sample_t2v.yaml

+ckpt_path: "models/opensora_v1.2_stage3.ckpt"
+t5_model_dir: "models/t5-v1_1-xxl/"
+
+vae_model_type: "OpenSoraVAE_V1_2"


Suggested change

vae_model_type: "OpenSoraVAE_V1_2"

vae_type: OpenSoraVAE_V1_2

hadipash · 2024-06-21T03:02:25Z

examples/opensora_hpcai/scripts/inference.py

Please set temporal compression VAE_T_COMPRESS and input_size.

…o pr_vae1.2

hadipash · 2024-06-25T07:17:21Z

examples/opensora_hpcai/opensora/models/vae/vae.py

+        return latent_size
+
+
+class VideoAutoencoderPipelineConfig(PretrainedConfig):


I don't think it's good to use PretrainedConfig with our project. It's better to move these parameters under __init__ in VideoAutoencoderPipeline.

* add vae 3d enc-dec * update test * dev save * testing * spatial vae test pass * fix * add vae param list * fix name order * add shape * add shape * order pnames * ordered temporal pnames * vae 3d recons ok * update docs * add test scripts * add convert script * adapt to 910b * support ms2.3 5d GN * rm test files * fix format * debug infer * add sample t2v yaml * fix i2v * update comment * fix format * rm tmp test * fix docs * fix var name * fix latent shape compute * add info * fix image enc/dec * fix format * adapt new vae in training * fix dtype * pad bf16 fixed by cast to fp16 * fix ops.pad bf16 with fp32 cast * replace pad with concat * replace pad_at_dim with concat for bf16

SamitHuang added 30 commits June 18, 2024 23:35

add vae 3d enc-dec

269b8b6

update test

c4de776

dev save

37f74a8

testing

5dff480

spatial vae test pass

c6122a8

fix

2edee15

Merge branch 'cai_os1.2' of github.com:SamitHuang/mindone into cai_os1.2

dfbfdf4

add vae param list

3faeb25

fix name order

3f9ee09

Merge branch 'cai_os1.2' of github.com:SamitHuang/mindone into cai_os1.2

cce6cb5

add shape

50dbf03

add shape

9e7e582

Merge branch 'cai_os1.2' of github.com:SamitHuang/mindone into cai_os1.2

2165403

order pnames

7ab7508

ordered temporal pnames

bf57f53

vae 3d recons ok

733fa68

update docs

3a32e29

add test scripts

5eb4c6e

add convert script

4241c2d

adapt to 910b

9d5cc08

support ms2.3 5d GN

543fbf6

rm test files

1fb8088

fix format

614884f

debug infer

5e5d3a9

add sample t2v yaml

74e6969

fix i2v

0395b89

Merge branch 'cai_os1.2' of https://github.com/samithuang/mindone int…

314d65f

…o cai_os1.2

update comment

8429baa

fix format

ec1e5bf

rm tmp test

420d935

SamitHuang requested review from CaitinZhao and geniuspatrick June 20, 2024 13:52

zhtmike reviewed Jun 21, 2024

View reviewed changes

hadipash reviewed Jun 21, 2024

View reviewed changes

SamitHuang added 4 commits June 21, 2024 14:20

fix var name

be55c84

fix latent shape compute

b671293

Merge branch 'pr_vae1.2' of https://github.com/samithuang/mindone int…

2756d94

…o pr_vae1.2

add info

d94ec5e

hadipash reviewed Jun 25, 2024

View reviewed changes

CaitinZhao approved these changes Jun 28, 2024

View reviewed changes

fix image enc/dec

6c3972b

SamitHuang force-pushed the pr_vae1.2 branch from 0603a16 to 6c3972b Compare June 28, 2024 04:12

SamitHuang added 6 commits June 28, 2024 14:04

fix format

44b5c56

adapt new vae in training

cd547ed

fix dtype

ba17c00

pad bf16 fixed by cast to fp16

0a067f9

fix ops.pad bf16 with fp32 cast

1ea897e

Merge branch 'master' into pr_vae1.2

32ed145

geniuspatrick approved these changes Jul 5, 2024

View reviewed changes

SamitHuang added 3 commits July 5, 2024 11:10

replace pad with concat

7bb2697

fix conflict

c42096c

replace pad_at_dim with concat for bf16

5507079

SamitHuang requested a review from zhanghuiyao as a code owner July 11, 2024 09:26

SamitHuang added this pull request to the merge queue Jul 11, 2024

Merged via the queue into mindspore-lab:master with commit 9c5e789 Jul 11, 2024
1 of 3 checks passed

SamitHuang removed the request for review from zhanghuiyao July 11, 2024 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hpcai OpenSora v1.2 - 3D VAE inference #560

Add hpcai OpenSora v1.2 - 3D VAE inference #560

SamitHuang commented Jun 20, 2024 •

edited

Loading

zhtmike Jun 21, 2024

SamitHuang Jun 21, 2024

hadipash Jun 28, 2024

hadipash Jun 28, 2024

zhtmike Jun 21, 2024

SamitHuang Jul 5, 2024

hadipash Jun 21, 2024

hadipash Jun 21, 2024

hadipash Jun 21, 2024 •

edited

Loading

SamitHuang Jul 5, 2024

hadipash Jun 25, 2024

	elif args.vae_dtype == 'OpenSoraVAE_V1_2"':
	elif args.vae_type == "OpenSoraVAE_V1_2":

	vae_model_type: "OpenSoraVAE_V1_2"
	vae_type: OpenSoraVAE_V1_2

		return latent_size


		class VideoAutoencoderPipelineConfig(PretrainedConfig):

Add hpcai OpenSora v1.2 - 3D VAE inference #560

Add hpcai OpenSora v1.2 - 3D VAE inference #560

Conversation

SamitHuang commented Jun 20, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadipash Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamitHuang commented Jun 20, 2024 •

edited

Loading

hadipash Jun 21, 2024 •

edited

Loading