[Training] On device training doesn't work with INT8 Models #19078

IzanCatalan · 2024-01-10T12:31:36Z

Describe the issue

I am re-training some onnx models from ONNX Model Zoo Repo, especially quantised Resnet50 with INT8 datatype. However, when creating the artifacts according to onnx-runtime-training-examples Repo I get the following error:

Traceback (most recent call last):
  File "prepare_for_training.py", line 38, in <module>
    artifacts.generate_artifacts(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/training/artifacts.py", line 152, in generate_artifacts
    _ = training_block(*[output.name for output in model.graph.output])
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/training/onnxblock/onnxblock.py", line 204, in __call__
    self._training_model, self._eval_model = _training_graph_utils.build_gradient_graph(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/training/onnxblock/_training_graph_utils.py", line 127, in build_gradient_graph
    optimized_model = onnx.load_from_string(get_optimized_model(model.SerializeToString(), requires_grad, options))
RuntimeError: /home/onnxruntime/orttraining/orttraining/core/optimizer/qdq_fusion.cc:104 std::pair<bool, onnxruntime::Node*> onnxruntime::{anonymous}::CheckForQDQPatternMatch(onnxruntime::Graph&, onnxruntime::Node&, const onnxruntime::InlinedHashSet<std::basic_string_view<char> >&) graph_utils::IsSupportedOptypeVersionAndDomain(*dequantize_node_ptr, "DequantizeLinear", {10, 13}) && graph_utils::IsSupportedProvider(*dequantize_node_ptr, compatible_execution_providers) was false. Expected that every QuantizeLinear node be followed by a unique DequantizeLinear node. Actual: QuantizeLinear (data_QuantizeLinear) is followed by QLinearConv(fused resnetv17_conv0_fwd_quant).

I would like to know what to do to solve it. Is there any way of retraining or doing Transfer Learning with ORT ?

For helping, my code looks like this:

 frozen_params = []
requires_grad = []
for init in onnx_model.graph.initializer:
    if init.name.endswith("running_mean") or init.name.endswith("running_var"):
        frozen_params.append(init.name)
    elif init.name not in frozen_params:
        requires_grad.append(init.name)

print(len(requires_grad), len(frozen_params))
print(frozen_params)
# Generate the training artifacts.
artifacts.generate_artifacts(
   onnx_model,
   requires_grad=requires_grad,
   frozen_params=frozen_params,
   loss=artifacts.LossType.CrossEntropyLoss,
   optimizer=artifacts.OptimType.AdamW,
   artifact_directory=sys.argv[2]
)

eval_model = onnx.load(f"{sys.argv[2]}/eval_model.onnx")
eval_model.graph.output.append(onnx_model.graph.output[0])
onnx.save(eval_model, f"{sys.argv[2]}/eval_model2.onnx")

To reproduce

I am running onnxruntime build from source for cuda 11.2, GCC 9.5, cmake 3.27 and python 3.8 with ubuntu 20.04.

Urgency

As soon as possible

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-training 1.17.0+cu112

PyTorch Version

None

Execution Provider

CUDA

Execution Provider Library Version

Cuda 11.2

The text was updated successfully, but these errors were encountered:

baijumeswani · 2024-01-10T17:10:59Z

Training will not work with a quantized model. How do you expect that training work with an INT8 model (backpropagation can only happen with floats).

The error you're hitting is a result of onnxruntime trying to convert your graph to a QAT graph. QAT with onnxruntime is still under experimental phase and we do not have complete support for it.

IzanCatalan · 2024-01-11T01:18:31Z

Thanks for the reply @baijumeswani . Yes, you are totally right, backpropagation cannot be done. I just hoped and asked, if there is a way of re-training a model using as you said QAT or Post Training Quantization with ORT. Will the support you mentioned soon be available, or is it a long-term plan?

Anyway, If I must re-train some models to INT8, as you said with ORT currently would be impossible, do you have any thoughts on how I could do it (using QAT, for instance) even with a different framework or AI Engine? Any help to clarify things would be highly appreciated.

Thank you.

baijumeswani · 2024-01-26T18:00:01Z

Yes, we will add some support for training a (fake) quantized model in some sense in the near to mid term. Maybe you can benefit from that. This is expected to be out in onnx runtime 1.18. Will keep you posted on that.

I am not aware of any framework that offers training of quantized models on the device. Sorry about that.

IzanCatalan added the training issues related to ONNX Runtime training; typically submitted using template label Jan 10, 2024

github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template labels Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] On device training doesn't work with INT8 Models #19078

[Training] On device training doesn't work with INT8 Models #19078

IzanCatalan commented Jan 10, 2024

baijumeswani commented Jan 10, 2024

IzanCatalan commented Jan 11, 2024 •

edited

Loading

baijumeswani commented Jan 26, 2024

[Training] On device training doesn't work with INT8 Models #19078

[Training] On device training doesn't work with INT8 Models #19078

Comments

IzanCatalan commented Jan 10, 2024

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

PyTorch Version

Execution Provider

Execution Provider Library Version

baijumeswani commented Jan 10, 2024

IzanCatalan commented Jan 11, 2024 • edited Loading

baijumeswani commented Jan 26, 2024

IzanCatalan commented Jan 11, 2024 •

edited

Loading