Quantized SeaLLM v2 Model Outputs Same as Input #21636
Labels
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
performance
issues related to performance regressions
quantization
issues related to quantization
Describe the issue
We encountered an issue while using SeaLLM v2, a 7B model, in ONNX format with int8 quantization for translation purposes. Here are the steps we followed and the problem we're facing:
Model Conversion to ONNX:
Model Quantization:
quantize_dynamic()
function to convert the fp32 model to int8.Issue:
To reproduce
Steps to Reproduce:
quantize_dynamic()
.Urgency
No response
Platform
Linux
OS Version
Ubutu 20.04.6
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: