how to convert a special input? #23042

Lenan22 · 2024-12-06T11:04:26Z

Describe the feature request

how to process the cfg_scale?

Describe scenario use case

vid: This is a torch tensor with the size [91200, 33].
timestep: A tensor with a single float value, like [988.] or [1000.], on ‘npu:1’ with dtype torch.bfloat16.
cfg_scale: This can be a tensor like [2.] or it can be None.

vid = torch.randn(91200, 33)
timestep = torch.tensor([988.])
cfg_scale = torch.tensor([2.]) or None

dynamic_axes = {
'vid': {0: 'batch_size'}, # The first dimension of vid is dynamic
'timestep': {0: 'batch_size'}, # The first dimension of timestep is dynamic
'cfg_scale': ???
'output': {0: 'batch_size'} # The first dimension of output is dynamic
}

torch.onnx.export(
model,
(vid, timestep, cfg_scale),
"your_model.onnx",
opset_version=17,
do_constant_folding=True,
input_names=['vid', 'timestep', 'cfg_scale'],
output_names=['output'],
dynamic_axes=dynamic_axes
)

how to process the cfg_scale?

tianleiwu · 2024-12-09T06:27:08Z

stable diffusion uses scale <= 1.0 to do classifier free guidance, and the only change is the text embedding batch size. The unet onnx model is still the same.

If you want to convert the guidance code [like https://github.com/microsoft/onnxruntime/blob/6d9636f07cccdb6e4ac453087ad54c3bc9854d50/onnxruntime/python/tools/transformers/models/stable_diffusion/pipeline_stable_diffusion.py#L458-L460] to onnx model, you can generate two models: one with cfg_scale > 1; another for cfg_scale <= 1.0. Another approach is to use If onnx operator: https://onnx.ai/onnx/operators/onnx__If.html.

Lenan22 · 2024-12-10T07:59:23Z

"Use the ONNX If operator: https://onnx.ai/onnx/operators/onnx__If.html.

Are there any related examples? "

tianleiwu · 2024-12-10T18:16:03Z

"Use the ONNX If operator: https://onnx.ai/onnx/operators/onnx__If.html.

Are there any related examples? "

Here is an example:
https://github.com/huggingface/optimum/blob/4a7cb298140ee9bed968d98a780a950d15bb2935/optimum/onnx/graph_transformations.py#L287-L306

It uses "If" to combine two graphs (one graph for decoder with kv cache input, and another graph for decoder without kv cache input).

Lenan22 added the feature request request for unsupported feature or enhancement label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to convert a special input? #23042

how to convert a special input? #23042

Lenan22 commented Dec 6, 2024

tianleiwu commented Dec 9, 2024

Lenan22 commented Dec 10, 2024

tianleiwu commented Dec 10, 2024

how to convert a special input? #23042

how to convert a special input? #23042

Comments

Lenan22 commented Dec 6, 2024

Describe the feature request

Describe scenario use case

tianleiwu commented Dec 9, 2024

Lenan22 commented Dec 10, 2024

tianleiwu commented Dec 10, 2024