-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StableDiffusion XL with TensorRT EP #17748
Conversation
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/engine_builder_tensorrt.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/engine_builder_tensorrt.py
Fixed
Show fixed
Hide fixed
self.torch_models = {} | ||
|
||
def teardown(self): | ||
for engine in self.engines.values(): |
Check failure
Code scanning / CodeQL
Suspicious unused loop iteration variable
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_schedulers.py
Fixed
Show fixed
Hide fixed
image = image.repeat(batch_size, 1, 1, 1) | ||
init_images.append(image) | ||
if self.nvtx_profile: | ||
nvtx.end_range(nvtx_image_preprocess) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
|
||
cudart.cudaEventRecord(self.events["clip-stop"], 0) | ||
if self.nvtx_profile: | ||
nvtx.end_range(nvtx_clip) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
init_latents = self.run_engine("vae_encoder", {"images": init_image})["latent"] | ||
cudart.cudaEventRecord(self.events["vae_encoder-stop"], 0) | ||
if self.nvtx_profile: | ||
nvtx.end_range(nvtx_vae) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
images = self.backend.vae_decode(latents) | ||
cudart.cudaEventRecord(self.events["vae-stop"], 0) | ||
if self.nvtx_profile: | ||
nvtx.end_range(nvtx_vae) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/engine_builder_tensorrt.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/engine_builder_tensorrt.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/engine_builder_tensorrt.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/diffusion_models.py
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/pipeline_img2img_xl.py
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/README.md
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/demo_txt2img.py
Fixed
Show fixed
Hide fixed
onnxruntime/python/tools/transformers/models/stable_diffusion/demo_txt2img.py
Fixed
Show fixed
Hide fixed
|
||
if not args.disable_cuda_graph: | ||
# inference once to get cuda graph | ||
_image, _latency = run_inference(warmup=True) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
|
||
if not args.disable_cuda_graph: | ||
# inference once to get cuda graph | ||
_image, _latency = run_inference(warmup=True) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
|
||
print("[I] Warming up ..") | ||
for _ in range(args.num_warmup_runs): | ||
_image, _latency = run_inference(warmup=True) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
|
||
print("[I] Warming up ..") | ||
for _ in range(args.num_warmup_runs): | ||
_image, _latency = run_inference(warmup=True) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
|
||
if not args.disable_cuda_graph: | ||
# inference once to get cuda graph | ||
_image, _latency = run_inference(warmup=True) |
Check notice
Code scanning / CodeQL
Unused global variable
|
||
if not args.disable_cuda_graph: | ||
# inference once to get cuda graph | ||
_image, _latency = run_inference(warmup=True) |
Check notice
Code scanning / CodeQL
Unused global variable
|
||
print("[I] Warming up ..") | ||
for _ in range(args.num_warmup_runs): | ||
_image, _latency = run_inference(warmup=True) |
Check notice
Code scanning / CodeQL
Unused global variable
|
||
print("[I] Warming up ..") | ||
for _ in range(args.num_warmup_runs): | ||
_image, _latency = run_inference(warmup=True) |
Check notice
Code scanning / CodeQL
Unused global variable
print("[I] Running StableDiffusion pipeline") | ||
if args.nvtx_profile: | ||
cudart.cudaProfilerStart() | ||
_image, _latency = run_inference(warmup=False) |
Check notice
Code scanning / CodeQL
Unused global variable
print("[I] Running StableDiffusion pipeline") | ||
if args.nvtx_profile: | ||
cudart.cudaProfilerStart() | ||
_image, _latency = run_inference(warmup=False) |
Check notice
Code scanning / CodeQL
Unused global variable
Accelerate StableDiffusion XL with TensorRT EP. It is modified from TensorRT demo diffusion, and we updated the design to make the pipeline works with different backend engines. The following result is from A100 80GB with 30 steps of Base, or 30 steps Base & 30 Steps Refiner to generate 1024x1024 images. The engine is built with static input shape, and cuda graph is enabled. | Batch Size | TRT Latency (ms) | ORT_TRT Latency (ms) | Diff -- | -- | -- | -- | -- Base | 1 | 2714 | 2679 | -1.3% Base & Refiner | 1 | 3593 | 3530 | -1.8% The test environment: onnxruntime-gpu is built from source, and the following packages or libraries are used in this test: * tensorrt==8.6.1.post1 * torch==2.2.0.dev20230920+cu121 * transformers==4.31.0 * diffusers==0.19.3 * onnx==1.14.1 * onnx-graphsurgeon==0.3.27 * polygraphy==0.47.1 * protobuf==3.20.2 * onnxruntime-gpu==1.17.0 (built from source of main branch) * CUDA 12.2.2 * cuDNN 8.9.5.29 * python 3.10.13
Accelerate StableDiffusion XL with TensorRT EP. It is modified from TensorRT demo diffusion, and we updated the design to make the pipeline works with different backend engines. The following result is from A100 80GB with 30 steps of Base, or 30 steps Base & 30 Steps Refiner to generate 1024x1024 images. The engine is built with static input shape, and cuda graph is enabled. | Batch Size | TRT Latency (ms) | ORT_TRT Latency (ms) | Diff -- | -- | -- | -- | -- Base | 1 | 2714 | 2679 | -1.3% Base & Refiner | 1 | 3593 | 3530 | -1.8% The test environment: onnxruntime-gpu is built from source, and the following packages or libraries are used in this test: * tensorrt==8.6.1.post1 * torch==2.2.0.dev20230920+cu121 * transformers==4.31.0 * diffusers==0.19.3 * onnx==1.14.1 * onnx-graphsurgeon==0.3.27 * polygraphy==0.47.1 * protobuf==3.20.2 * onnxruntime-gpu==1.17.0 (built from source of main branch) * CUDA 12.2.2 * cuDNN 8.9.5.29 * python 3.10.13
Description
Accelerate StableDiffusion XL with TensorRT EP. It is modified from TensorRT demo diffusion, and we updated the design to make the pipeline works with different backend engines.
Peformance
The following result is from A100 80GB with 30 steps of Base, or 30 steps Base & 30 Steps Refiner to generate 1024x1024 images. The engine is built with static input shape, and cuda graph is enabled. onnxruntime-gpu is built from source, and the following packages or libraries are used in this test:
Motivation and Context