-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PaddleMIX ppdiffusers Stable Diffusion 3 inference optimize #681
Open
chang-wenbin
wants to merge
59
commits into
PaddlePaddle:develop
Choose a base branch
from
chang-wenbin:SD3_PaddleMIX_819
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 50 commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
a6631e7
optimize SD3
chang-wenbin b0ea9ef
optimize SD3 transformer_SD3
chang-wenbin f06a61a
optimize SD3 transformer_SD3
chang-wenbin dcff90c
update SD3
chang-wenbin 15c5e44
uodate triton &sim_SD3
chang-wenbin ab73a63
modify temb_silu && modify nvtx
chang-wenbin ed2b7b1
modify linear from fused_linear
chang-wenbin f4330d3
modify simplified_sd3
chang-wenbin cc1af0f
add split_concat triton kernel
chang-wenbin 70e6b6e
modify split_concat triton kernel
chang-wenbin 9543b11
update
chang-wenbin 357b75a
update transformer_sd3
chang-wenbin f54bf84
update transformer_sd3
chang-wenbin 3245b2f
update triton & simplified_sd3
chang-wenbin 5516df6
update simplified_sd3
chang-wenbin 874d5d7
update simplified_sd3
chang-wenbin 111f4cd
delete context_pre_only=False
chang-wenbin 18777b6
modify triton_optimize
chang-wenbin 7a288e4
modify triton_optimize
chang-wenbin 840b153
modify triton_optimize
chang-wenbin 95c9e47
modify triton_fuse & Modifying performance issues affected by CUDA sy…
chang-wenbin 84a9e7a
modify transformer_sd3 if optimize_prigin
chang-wenbin 9dd918d
update vae triton_split
chang-wenbin 3a0b7e1
vae T5 d2s & transformer forward d2s
chang-wenbin 6d02d79
update demo
chang-wenbin 5d81b44
update five model d2s
chang-wenbin 4bab118
update SD3 clip T5 vae
chang-wenbin 5a14a0f
update clip
chang-wenbin cd2ef01
uodate T5
chang-wenbin 624168c
uodate T5
chang-wenbin b009b9f
update scheduling_flow_match_euler_discrete
chang-wenbin 8caa10a
update normalization
chang-wenbin 377629a
update normalization
chang-wenbin 6863054
Merge remote-tracking branch 'upstream/develop' into SD3_PaddleMIX_819
chang-wenbin 15fda4e
update SD3
chang-wenbin cb993c5
merge develop
chang-wenbin 0e90eaf
update cutlass gemm&fast_gelu
chang-wenbin c5bb81f
update per-mmdit
chang-wenbin 2c8cc85
merge develop
chang-wenbin 499752a
update triton op split_concat
chang-wenbin 1084f4a
update embeddings
chang-wenbin e3a5d7c
merge
chang-wenbin fa84559
recovery
chang-wenbin 27c62f9
recovery
chang-wenbin 951f7a6
merge
chang-wenbin 9515323
update normalization
chang-wenbin d61e4cb
update dtype
chang-wenbin d961a4a
add SD3 doc
chang-wenbin ac1e139
merge develop
chang-wenbin 48c66a6
update SD3 doc
chang-wenbin 24c3c9e
add 'del transformer_blocks'
chang-wenbin 422f33b
update SD3
chang-wenbin c43d84f
update SD3
chang-wenbin 9d03624
update Notes
chang-wenbin ded06bf
add Notes
chang-wenbin d845da2
update demo
chang-wenbin db6aad1
update doc
chang-wenbin 3527954
update SD3
chang-wenbin e7848a3
merge zkk
chang-wenbin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Stable Diffusion 3 高性能推理 | ||
|
||
- Paddle Inference提供Stable Diffusion 3 模型高性能推理实现,推理性能提升70%+ | ||
环境准备: | ||
```shell | ||
# 安装 triton并适配paddle | ||
python -m pip install triton | ||
python -m pip install git+https://github.com/zhoutianzi666/UseTritonInPaddle.git | ||
python -c "import use_triton_in_paddle; use_triton_in_paddle.make_triton_compatible_with_paddle()" | ||
|
||
# 安装develop版本的paddle | ||
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/ | ||
|
||
#指定Tensor-RT的lib路径 | ||
export LD_LIBRARY_PATH=/your_TensorRT_dir//lib:$LD_LIBRARY_PATH | ||
|
||
#指定cutlass包路径 | ||
export LD_LIBRARY_PATH=/your_dir/Paddle/paddle/phi/kernels/fusion/cutlass/conv2d/build:$LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=/your_dir/Paddle/paddle/phi/kernels/fusion/cutlass/gemm_epilogue/build:$LD_LIBRARY_PATH | ||
``` | ||
|
||
高性能推理指令: | ||
```shell | ||
#step1: 生成FP32的paddle模型,同时根据Paddle模型生成FP16的TensorRT engine。 | ||
python text_to_image_generation-stable_diffusion_3.py --dtype float32 --height 512 --width 512 \ | ||
--num-inference-steps 50 --inference_optimize 1 --inference_optimize_triton 1 \ | ||
--benchmark 1 | ||
|
||
#step2: 执行FP16推理 | ||
python text_to_image_generation-stable_diffusion_3.py --dtype float16 --height 512 --width 512 \ | ||
--num-inference-steps 50 --inference_optimize 1 --inference_optimize_triton 1 \ | ||
--benchmark 1 | ||
``` | ||
|
||
- 在 NVIDIA A100-SXM4-40GB 上测试的性能如下: | ||
|
||
| Paddle Inference| OneDiff | PyTorch | Paddle 动态图 | | ||
| --------------- | ------------ | ------------ | ------------ | | ||
| 1.2 s | 1.58 s | 1.78 s | 4.202 s | |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加一句,请使用2024年9月6日之后的PaddleNLP,因为在该天,我们修复了一个针对PaddleNLP的bug。
https://github.com/PaddlePaddle/PaddleNLP/pull/9016/files