feat(diffusers/models): add models like autoencoders, transformers, unets and etc. #523

townwish4git · 2024-06-03T02:13:01Z

What does this PR do?

Adds # (feature)

implements of models in mindone.diffusers.models, include:

Model

AutoEncoders

UNets

Transformers

supplement: https://gist.github.com/Cui-yshoho/7ff86a76323c37f1c197d7fef67702ec

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

geniuspatrick

yushi，huiyu合入后，rebase， resolve conflict

townwish4git · 2024-06-19T06:41:29Z

yushi，huiyu合入后，rebase， resolve conflict

done.

mindone/diffusers/README.md

geniuspatrick · 2024-06-20T02:58:38Z

mindone/diffusers/README.md

@@ -134,6 +150,39 @@ Most base, utility and mixin class are available.
 Unlike the output `posterior = DiagonalGaussianDistribution(latent)`, which can do sampling by `posterior.sample()`.
 We can only output the `latent` and then do sampling through `AutoencoderKL.diag_gauss_dist.sample(latent)`.

+### `nn.Conv3d`


这部分删掉，不用列出

之前说的不同精度和模式下的支持列表，先放在pr的说明中

TODO: 后续考虑和yushi的列表和README中的limitation章节合并，提取出一个Limitation.md以作说明

已删除该部分。yushi的列表下方我也贴了nn.Conv3d影响情况的，我把它贴到这边

geniuspatrick · 2024-06-20T03:15:48Z

mindone/diffusers/models/attention_processor.py

@@ -430,7 +430,7 @@ def get_attention_scores(self, query: ms.Tensor, key: ms.Tensor, attention_mask:
            )
        else:
            attention_scores = ops.baddbmm(
-                attention_mask,
+                attention_mask.to(query.dtype),


这个为什么要加cast

ops.baddbmm内部会检查attention_mask, query, key三者的dtype是否一致。在upcast_attention的情况下前者与后两者不一致，会报错

geniuspatrick · 2024-06-20T03:16:51Z

mindone/diffusers/models/attention_processor.py

@@ -475,7 +475,9 @@ def prepare_attention_mask(
            #       we want to instead pad by (0, remaining_length), where remaining_length is:
            #       remaining_length: int = target_length - current_length
            # TODO: re-enable tests/models/test_models_unet_2d_condition.py#test_model_xattn_padding
-            attention_mask = ops.pad(attention_mask, (0, target_length), value=0.0)
+            attention_mask = ops.Pad(paddings=((0, 0),) * (attention_mask.ndim - 1) + ((0, target_length),))(


需要手动* (attention_mask.ndim - 1)吗？

ops.Pad需要paddings的shape与padded tensor的ndim对应（https://www.mindspore.cn/docs/zh-CN/r2.3.0rc1/api_python/ops/mindspore.ops.Pad.html?highlight=pad#mindspore.ops.Pad）

geniuspatrick · 2024-06-20T03:20:17Z

mindone/diffusers/models/autoencoders/vae.py

+    def construct(self, x: ms.Tensor, mask=None) -> ms.Tensor:
+        r"""The forward method of the `MaskConditionEncoder` class."""
+        out = {}
+        for l in range(len(self.layers)):  # noqa: E741


单独一个小字母的变量l不符合规范，为保持变量名和diffusers一致因此加了noqa

geniuspatrick · 2024-06-20T03:21:19Z

mindone/diffusers/models/autoencoders/vae.py

+        self.legacy = legacy
+
+        self.embedding = nn.Embedding(self.n_e, self.vq_embed_dim)
+        self.embedding.embedding_table.set_data(


TODO: init method refactor

done, review plz

geniuspatrick · 2024-06-20T03:27:35Z

mindone/diffusers/models/transformers/t5_film_transformer.py

+
+    def construct(self, input: ms.Tensor) -> ms.Tensor:
+        return (
+            0.5 * input * (1.0 + ops.tanh(float(math.sqrt(2.0 / math.pi)) * (input + 0.044715 * ops.pow(input, 3.0))))


construct里面使用math是否会有问题？考虑将常量手动放在外面

Graph mode下对math.xxx()return结果的类型判断会存在问题，因此用float()包了一层让其能被判断为python内置类型。gitee mindspore某个issue提过这个问题和这个解法，具体链接找不到了

geniuspatrick · 2024-06-20T03:39:38Z

tests/diffusers/models/test_generic_modules.py

generic和transformers为什么分开？

# In cases where models have unique initialization procedures or require testing with specialized output formats, # it is necessary to develop distinct, dedicated test cases.

t5_film_decoder的输入输出格式不符合通用的测试用例结构，因此单独拉出来了。🤔可以考虑把t5的测试文件命名更清晰化，例如就叫test_t5_film_decoder.py？

CaitinZhao · 2024-06-26T06:39:20Z

mindone/diffusers/models/autoencoders/vae.py

+    multiplications and allows for post-hoc remapping of indices.
+    """
+
+    # NOTE: due to a bug the beta term was applied to the wrong term. for


this comment comes from origin diffusers, not mindspore version 233

CaitinZhao · 2024-06-26T06:49:02Z

mindone/diffusers/models/transformers/t5_film_transformer.py

+
+    def construct(self, input: ms.Tensor) -> ms.Tensor:
+        return (
+            0.5 * input * (1.0 + ops.tanh(float(math.sqrt(2.0 / math.pi)) * (input + 0.044715 * ops.pow(input, 3.0))))


这块的type预期是什么？和input一致？动态图下这个的type推导可能有问题

这块的type预期是什么？和input一致？动态图下这个的type推导可能有问题

预期和input一致。在什么情况下会有问题呢？目前看起来这里的逻辑没啥问题？

# python-built-in-float * ms.Tensor(dtype=...) => ms.Tensor(dtype=same...) float(math.sqrt(2.0 / math.pi)) * (input + 0.044715 * ops.pow(input, 3.0))

目前 {动态图，静态图} x {fp16, fp32} 的单元测试都能跑通且和torch精度一致，或许bf16会有问题？之前确实遇到过 ms.Tensor(bf16) * python-built-in-float => ms.Tensor(fp32)的情况😂

townwish4git · 2024-06-28T03:13:30Z

use magic number 0.797885 to replace math.sqrt(2.0 / math.pi), passed ut on cpu in {fp16, fp32} x {pynative, graph} with almost same results comparing to pytorch

townwish4git force-pushed the 0529models branch from 29bdbb9 to 7aef710 Compare June 11, 2024 17:42

geniuspatrick reviewed Jun 14, 2024

View reviewed changes

townwish4git force-pushed the 0529models branch from 7aef710 to 98190f5 Compare June 19, 2024 06:36

geniuspatrick reviewed Jun 20, 2024

View reviewed changes

townwish4git force-pushed the 0529models branch 2 times, most recently from 2a6bbde to 60b2ad2 Compare June 20, 2024 06:42

geniuspatrick approved these changes Jun 20, 2024

View reviewed changes

townwish4git mentioned this pull request Jun 21, 2024

feat(diffusers/pipelines): add pipelines like ControlNet, T2I-Adapter, UnCLIP, Pixart, AnimateDiff and etc. #562

Merged

30 tasks

CaitinZhao reviewed Jun 26, 2024

View reviewed changes

feat(diffusers/models): add models

486e8e9

townwish4git force-pushed the 0529models branch from 60b2ad2 to 486e8e9 Compare June 28, 2024 03:09

CaitinZhao approved these changes Jun 28, 2024

View reviewed changes

CaitinZhao added this pull request to the merge queue Jun 28, 2024

Merged via the queue into mindspore-lab:master with commit f4328d1 Jun 28, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diffusers/models): add models like autoencoders, transformers, unets and etc. #523

feat(diffusers/models): add models like autoencoders, transformers, unets and etc. #523

townwish4git commented Jun 3, 2024 •

edited

Loading

geniuspatrick left a comment •

edited

Loading

townwish4git commented Jun 19, 2024

geniuspatrick Jun 20, 2024

geniuspatrick Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

geniuspatrick Jun 20, 2024

townwish4git Jun 20, 2024

CaitinZhao Jun 26, 2024

townwish4git Jun 26, 2024

CaitinZhao Jun 26, 2024

townwish4git Jun 26, 2024

townwish4git Jun 26, 2024

townwish4git commented Jun 28, 2024

feat(diffusers/models): add models like autoencoders, transformers, unets and etc. #523

feat(diffusers/models): add models like autoencoders, transformers, unets and etc. #523

Conversation

townwish4git commented Jun 3, 2024 • edited Loading

What does this PR do?

Model

AutoEncoders

UNets

Transformers

Before submitting

Who can review?

geniuspatrick left a comment • edited Loading

Choose a reason for hiding this comment

townwish4git commented Jun 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

townwish4git commented Jun 28, 2024

townwish4git commented Jun 3, 2024 •

edited

Loading

geniuspatrick left a comment •

edited

Loading