-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Olive workflow for mistral model optimization does not work #1075
Comments
what's the full log? it seems the cache folder contains the converted model. |
That is the full log. I figured out there was some issue in optimizing the converted model. So I made the following changes to the
When I ran the script again, it seems to produce a quantized model, but that is only 1.45 GB on disk. I tried running the model using the
|
If above is full log, I am guessing your hit out of memory when optimize the converted onnx model. the oom will kill the python process by OS. Could you try find bigger memory and retry? |
Yes, I am on it. The quantization took around 3.5 hours on my Intel i9-13980HX CPU. So it is time consuming to test |
would you try by changing the accelerators like https://github.com/microsoft/Olive/blob/main/examples/mistral/mistral_fp16_optimize.json#L15-L21 |
for more info, please refer to https://microsoft.github.io/Olive/tutorials/configure_systems.html |
Thank you @guotuofeng. I can confirm that the examples If anyone faces similar issues, make sure that you have sufficient disk space (around 100 GB or more). The disk space seemed to be a bottleneck for me and not the RAM. I tested it on two computers with 64 GB RAM and it worked well. Here are some details for the
I faced some other issues such as the resulting quantized model's responses being very poor and the |
Using mistral.py, we can carry out inference using the onnxruntime-genai seemed to be an option, but it does not yet have support for |
could you try https://microsoft.github.io/Olive/api/passes.html#cmdoption-arg-115 with backend onnxrt_dml_ep? I am not sure whether the int4 quantization works or not against dml. |
I suppose you meant onnxrt_dml_ep and not onnxrt_dnnl_ep. Anyway, I tried both. TRIAL 1: I updated the
Here is the full log:
TRIAL 2: I changed
Here is the full log:
|
yes I mean dml ep. as for the error, we might need ask from dml ep team. @PatriceVignola, do you have any insight with this error? |
@guotuofeng The following code snippet works like a charm with the INT4 model created using the scripts in examples/mistral
I simply want to use
Do you know if I can fix this error? or is it not possible to use |
I am not sure, I don't try DML before since we doesn't have dml GPU. |
@guotuofeng Thank you for the responses. I am now trying out LLM Optimization with DirectML, which has been updated yesterday. |
Actually, some OPs is still pending to merge in that example. |
The solutions are recommended to be updated on the main codebase. However, please let me know if anything else is a better option to run. Thanks |
DirectML is now supported in the Generate API for ONNX Runtime: https://onnxruntime.ai/docs/genai/howto/install.html#directml You can create models for DML using the model builder option in Olive's Automatic Optimizer: !olive auto-opt \
--model_name_or_path MODEL \
--output_path models/MODEL \
--trust_remote_code \
--device gpu \
--provider DmlExecutionProvider \
--use_model_builder \
--use_ort_genai \
--precision int4 \
--log_level 1 Where |
Describe the bug
Following the instructions in
examples/mistral
does not result in a quantized onnx model. After running the workflow, myoutput_model
folder within thecache
directory contains an onnx model that is 27 GB on disk and themodels
folder does not contain a quantized model.To Reproduce
Follow the instructions in
examples/mistral
to run the optimization on CPU using:python mistral.py --optimize --config mistral_int4_optimize.json
Expected behavior
Expected to obtain an output model that is around 3.5 GB in the
models
directory.Olive config
Available here
Olive logs
C:\Olive\examples\mistral>python mistral.py --optimize --config mistral_int4_optimize.json
Optimizing mistralai/Mistral-7B-v0.1
[2024-04-11 15:14:42,927] [INFO] [run.py:243:run] Loading Olive module configuration from: C:\Olive\olive\olive_config.json
[2024-04-11 15:14:42,933] [INFO] [accelerator.py:324:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-04-11 15:14:42,934] [INFO] [run.py:196:run_engine] Importing pass module OptimumConversion
[2024-04-11 15:14:42,934] [INFO] [run.py:196:run_engine] Importing pass module OrtTransformersOptimization
[2024-04-11 15:14:42,935] [INFO] [run.py:196:run_engine] Importing pass module IncStaticQuantization
[2024-04-11 15:14:42,936] [INFO] [engine.py:106:initialize] Using cache directory: cache
[2024-04-11 15:14:42,937] [INFO] [engine.py:262:run] Running Olive on accelerator: cpu-cpu
[2024-04-11 15:14:43,817] [INFO] [engine.py:864:_run_pass] Running pass convert:OptimumConversion
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.22s/it]
Automatic task detection to text-generation-with-past (possible synonyms are: causal-lm-with-past).
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
C:\MiniConda3\envs\myonnxrt\lib\site-packages\transformers\modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
C:\MiniConda3\envs\myonnxrt\lib\site-packages\optimum\exporters\onnx\model_patcher.py:301: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
C:\MiniConda3\envs\myonnxrt\lib\site-packages\transformers\models\mistral\modeling_mistral.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
C:\MiniConda3\envs\myonnxrt\lib\site-packages\transformers\models\mistral\modeling_mistral.py:676: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
Saving external data to one file...
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating ONNX model cache/models/0_OptimumConversion-d3eae021dc4ad3d4cdbc16eba52ef561-ad904e90276e2793a36f3373323e91e1/output_model/model.onnx...
-[✓] ONNX model output names match reference model (present.31.key, present.18.key, present.13.value, present.0.value, present.7.key, present.20.value, present.15.key, present.3.key, present.18.value, present.29.value, present.14.value, present.4.value, present.9.value, present.26.key, present.24.value, present.27.key, present.23.value, present.10.value, present.6.value, present.28.key, present.4.key, present.8.key, present.17.key, present.1.key, present.27.value, present.16.value, present.11.key, present.15.value, present.23.key, present.21.key, present.5.key, present.7.value, present.21.value, present.26.value, present.30.key, present.0.key, present.2.value, present.11.value, present.9.key, present.16.key, present.17.value, present.19.value, present.10.key, present.20.key, present.25.value, present.31.value, present.29.key, present.2.key, present.25.key, present.28.value, present.8.value, present.24.key, present.30.value, present.12.value, present.13.key, present.22.key, present.22.value, present.12.key, present.19.key, present.14.key, present.1.value, present.6.key, logits, present.3.value, present.5.value)
- Validating ONNX Model output "logits":
-[✓] (2, 16, 32000) matches (2, 16, 32000)
-[x] values not close enough, max diff: 3.62396240234375e-05 (atol: 1e-05)
- Validating ONNX Model output "present.0.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.0.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.1.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.1.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.2.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.2.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.3.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.3.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.4.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.4.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.5.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.5.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.6.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.6.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.7.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.7.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.8.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.8.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.9.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.9.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.10.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.10.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.11.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.11.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.12.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.12.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.13.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.5854835510253906e-05 (atol: 1e-05)
- Validating ONNX Model output "present.13.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.14.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.14.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.15.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.7523765563964844e-05 (atol: 1e-05)
- Validating ONNX Model output "present.15.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.16.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 2.0742416381835938e-05 (atol: 1e-05)
- Validating ONNX Model output "present.16.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.17.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 2.6702880859375e-05 (atol: 1e-05)
- Validating ONNX Model output "present.17.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.18.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 3.0279159545898438e-05 (atol: 1e-05)
- Validating ONNX Model output "present.18.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.19.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 4.1961669921875e-05 (atol: 1e-05)
- Validating ONNX Model output "present.19.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.20.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 4.935264587402344e-05 (atol: 1e-05)
- Validating ONNX Model output "present.20.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.21.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 5.6743621826171875e-05 (atol: 1e-05)
- Validating ONNX Model output "present.21.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.22.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 5.91278076171875e-05 (atol: 1e-05)
- Validating ONNX Model output "present.22.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.23.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 5.5789947509765625e-05 (atol: 1e-05)
- Validating ONNX Model output "present.23.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.24.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 4.0531158447265625e-05 (atol: 1e-05)
- Validating ONNX Model output "present.24.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.25.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 3.4809112548828125e-05 (atol: 1e-05)
- Validating ONNX Model output "present.25.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.26.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 3.814697265625e-05 (atol: 1e-05)
- Validating ONNX Model output "present.26.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.1026859283447266e-05 (atol: 1e-05)
- Validating ONNX Model output "present.27.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 2.956390380859375e-05 (atol: 1e-05)
- Validating ONNX Model output "present.27.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.28.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 3.0040740966796875e-05 (atol: 1e-05)
- Validating ONNX Model output "present.28.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.2159347534179688e-05 (atol: 1e-05)
- Validating ONNX Model output "present.29.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.7642974853515625e-05 (atol: 1e-05)
- Validating ONNX Model output "present.29.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.9088387489318848e-05 (atol: 1e-05)
- Validating ONNX Model output "present.30.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.9550323486328125e-05 (atol: 1e-05)
- Validating ONNX Model output "present.30.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.519918441772461e-05 (atol: 1e-05)
- Validating ONNX Model output "present.31.key":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.31.value":
-[✓] (2, 8, 32, 128) matches (2, 8, 32, 128)
-[x] values not close enough, max diff: 1.52587890625e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
The exported model was saved at: cache/models/0_OptimumConversion-d3eae021dc4ad3d4cdbc16eba52ef561-ad904e90276e2793a36f3373323e91e1/output_model
[2024-04-11 15:23:26,254] [INFO] [engine.py:951:_run_pass] Pass convert:OptimumConversion finished in 522.433565 seconds
[2024-04-11 15:23:26,296] [INFO] [engine.py:864:_run_pass] Running pass optimize:OrtTransformersOptimization
Other information
0.6.0
onnxruntime-gpu 1.17.1
Additional context
It appears that the quantization is not being performed at all. So checking out what the issue is.
The text was updated successfully, but these errors were encountered: