Quantized SeaLLM v2 Model Outputs Same as Input #21636

sabre-code · 2024-08-06T09:30:51Z

Describe the issue

We encountered an issue while using SeaLLM v2, a 7B model, in ONNX format with int8 quantization for translation purposes. Here are the steps we followed and the problem we're facing:

Model Conversion to ONNX:
- We used the Optimum CLI to convert SeaLLM v2 into ONNX format.
- The conversion resulted in a full precision (fp32) ONNX model.
Model Quantization:
- We applied the quantize_dynamic() function to convert the fp32 model to int8.
- The quantization process completed without errors.
Issue:
- When using the quantized model for translation, the output is identical to the input.
- This issue is not isolated to SeaLLM v2; we have faced similar problems with other model quantizations like TinyLlama.

To reproduce

Steps to Reproduce:

Convert SeaLLM v2 to ONNX using the Optimum CLI.
Quantize the ONNX model from fp32 to int8 using quantize_dynamic().
Use the quantized model for a translation task.
Observe that the output is the same as the input.

Urgency

No response

Platform

Linux

OS Version

Ubutu 20.04.6

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

The text was updated successfully, but these errors were encountered:

yufenglee · 2024-08-15T17:34:57Z

@sabre-code, could you please try running this model with onnxruntime-genai?
And here is the example to create the model and run the similar model:
https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/README.md#get-the-model

sabre-code added the performance issues related to performance regressions label Aug 6, 2024

github-actions bot added the quantization issues related to quantization label Aug 6, 2024

sophies927 added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized SeaLLM v2 Model Outputs Same as Input #21636

Quantized SeaLLM v2 Model Outputs Same as Input #21636

sabre-code commented Aug 6, 2024

yufenglee commented Aug 15, 2024

Quantized SeaLLM v2 Model Outputs Same as Input #21636

Quantized SeaLLM v2 Model Outputs Same as Input #21636

Comments

sabre-code commented Aug 6, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

yufenglee commented Aug 15, 2024