You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I follow the step in the Llama smooth quantizing in this repo. I can But I got the following error message when quantize the meta-llama/Llama-2-7b-chat-hf model. I can successfully conver to the onnx model but it show missing '/tmp/ort.quant.o8y23ivu/9954a718-bf52-11ee-911b-49517355336e' when quantizing. Any idea what's going on?
(.venv) igaspard@DESKTOP-0OQ6DQP:~/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant$ time bash run_quant.sh --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format="QOperator"
+ main --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format=QOperator
+ init_params --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format=QOperator
+ for var in "$@"
+ case $var in
++ echo --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx
++ cut -f2 -d=
+ model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx
+ for var in "$@"
+ case $var in
++ echo --model_output=/home/igaspard/quan_model
++ cut -f2 -d=
+ model_output=/home/igaspard/quan_model
+ for var in "$@"
+ case $var in
++ echo --batch_size=1
++ cut -f2 -d=
+ batch_size=1
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
++ echo --quant_format=QOperator
++ cut -f2 -d=
+ quant_format=QOperator
+ run_tuning
+ [[ /home/igaspard/Llama-2-7b-chat-hf-onnx =~ \.onnx$ ]]
+ [[ /home/igaspard/quan_model =~ \.onnx$ ]]
+ '[' '!' -d /home/igaspard/quan_model ']'
+ python main.py --quant_format QOperator --model_input /home/igaspard/Llama-2-7b-chat-hf-onnx --model_output /home/igaspard/quan_model --batch_size 1 --smooth_quant_alpha 0.5 --dataset NeelNanda/pile-10k --quantize
htop01/30/2024 17:30:59 - WARNING - root - Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-01-30 17:33:17 [INFO] Start smooth model calibration.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-01-30 17:39:57 [INFO] Start smooth scales collection.
Progress: [####################] 100.00%Traceback (most recent call last):
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/main.py", line 240, in <module>
quantize_static(model_file,
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnxruntime/quantization/quantize.py", line 424, in quantize_static
model = load_model_with_shape_infer(Path(model_input)) # use smooth quant model for calibration
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnxruntime/quantization/quant_utils.py", line 630, in load_model_with_shape_infer
model = onnx.load(inferred_model_path.as_posix())
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/__init__.py", line 214, in load_model
load_external_data_for_model(model, base_dir)
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/external_data_helper.py", line 65, in load_external_data_for_model
load_external_data_for_tensor(tensor, base_dir)
File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/external_data_helper.py", line 45, in load_external_data_for_tensor
with open(external_data_file_path, "rb") as data_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ort.quant.o8y23ivu/9954a718-bf52-11ee-911b-49517355336e'
I follow the step in the Llama smooth quantizing in this repo. I can But I got the following error message when quantize the meta-llama/Llama-2-7b-chat-hf model. I can successfully conver to the onnx model but it show missing '/tmp/ort.quant.o8y23ivu/9954a718-bf52-11ee-911b-49517355336e' when quantizing. Any idea what's going on?
Attache some of my env like python package
The text was updated successfully, but these errors were encountered: