Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to quantize the Llama-2-7b-chat-hf model #375

Closed
igaspard opened this issue Jan 30, 2024 · 3 comments
Closed

Fail to quantize the Llama-2-7b-chat-hf model #375

igaspard opened this issue Jan 30, 2024 · 3 comments
Assignees

Comments

@igaspard
Copy link

I follow the step in the Llama smooth quantizing in this repo. I can But I got the following error message when quantize the meta-llama/Llama-2-7b-chat-hf model. I can successfully conver to the onnx model but it show missing '/tmp/ort.quant.o8y23ivu/9954a718-bf52-11ee-911b-49517355336e' when quantizing. Any idea what's going on?

(.venv) igaspard@DESKTOP-0OQ6DQP:~/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant$ time bash run_quant.sh --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format="QOperator"
+ main --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format=QOperator
+ init_params --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx --model_output=/home/igaspard/quan_model --batch_size=1 --dataset NeelNanda/pile-10k --alpha 0.75 --quant_format=QOperator
+ for var in "$@"
+ case $var in
++ echo --model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx
++ cut -f2 -d=
+ model_input=/home/igaspard/Llama-2-7b-chat-hf-onnx
+ for var in "$@"
+ case $var in
++ echo --model_output=/home/igaspard/quan_model
++ cut -f2 -d=
+ model_output=/home/igaspard/quan_model
+ for var in "$@"
+ case $var in
++ echo --batch_size=1
++ cut -f2 -d=
+ batch_size=1
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
+ for var in "$@"
+ case $var in
++ echo --quant_format=QOperator
++ cut -f2 -d=
+ quant_format=QOperator
+ run_tuning
+ [[ /home/igaspard/Llama-2-7b-chat-hf-onnx =~ \.onnx$ ]]
+ [[ /home/igaspard/quan_model =~ \.onnx$ ]]
+ '[' '!' -d /home/igaspard/quan_model ']'
+ python main.py --quant_format QOperator --model_input /home/igaspard/Llama-2-7b-chat-hf-onnx --model_output /home/igaspard/quan_model --batch_size 1 --smooth_quant_alpha 0.5 --dataset NeelNanda/pile-10k --quantize
htop01/30/2024 17:30:59 - WARNING - root -   Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-01-30 17:33:17 [INFO] Start smooth model calibration.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-01-30 17:39:57 [INFO] Start smooth scales collection.
Progress: [####################] 100.00%Traceback (most recent call last):
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/main.py", line 240, in <module>
    quantize_static(model_file,
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnxruntime/quantization/quantize.py", line 424, in quantize_static
    model = load_model_with_shape_infer(Path(model_input))  # use smooth quant model for calibration
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnxruntime/quantization/quant_utils.py", line 630, in load_model_with_shape_infer
    model = onnx.load(inferred_model_path.as_posix())
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/__init__.py", line 214, in load_model
    load_external_data_for_model(model, base_dir)
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/external_data_helper.py", line 65, in load_external_data_for_model
    load_external_data_for_tensor(tensor, base_dir)
  File "/home/igaspard/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant/.venv/lib/python3.9/site-packages/onnx/external_data_helper.py", line 45, in load_external_data_for_tensor
    with open(external_data_file_path, "rb") as data_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ort.quant.o8y23ivu/9954a718-bf52-11ee-911b-49517355336e'

Attache some of my env like python package

(.venv) igaspard@DESKTOP-0OQ6DQP:~/onnxruntime-inference-examples/quantization/language_model/llama/smooth_quant$ pip list
Package                          Version
-------------------------------- ----------
absl-py                          2.1.0
accelerate                       0.26.1
aiohttp                          3.9.2
aiosignal                        1.3.1
annotated-types                  0.6.0
antlr4-python3-runtime           4.9.3
anyio                            4.2.0
async-timeout                    4.0.3
attrs                            23.2.0
certifi                          2023.11.17
chardet                          5.2.0
charset-normalizer               3.3.2
click                            8.1.7
cmake                            3.28.1
colorama                         0.4.6
coloredlogs                      15.0.1
contextlib2                      21.6.0
contourpy                        1.2.0
cycler                           0.12.1
DataProperty                     1.0.1
datasets                         2.14.7
Deprecated                       1.2.14
dill                             0.3.7
distro                           1.9.0
einops                           0.7.0
evaluate                         0.4.1
exceptiongroup                   1.2.0
filelock                         3.13.1
flatbuffers                      23.5.26
fonttools                        4.47.2
frozenlist                       1.4.1
fsspec                           2023.10.0
h11                              0.14.0
httpcore                         1.0.2
httpx                            0.26.0
huggingface-hub                  0.17.3
humanfriendly                    10.0
idna                             3.6
importlib-resources              6.1.1
intel-extension-for-transformers 1.3.1
Jinja2                           3.1.3
joblib                           1.3.2
jsonlines                        4.0.0
kiwisolver                       1.4.5
lm-eval                          0.3.0
MarkupSafe                       2.1.4
matplotlib                       3.8.2
mbstrdecoder                     1.1.3
mpmath                           1.3.0
multidict                        6.0.4
multiprocess                     0.70.15
networkx                         3.2.1
neural-compressor                2.4.1
nltk                             3.8.1
numexpr                          2.9.0
numpy                            1.26.3
nvidia-cublas-cu12               12.1.3.1
nvidia-cuda-cupti-cu12           12.1.105
nvidia-cuda-nvrtc-cu12           12.1.105
nvidia-cuda-runtime-cu12         12.1.105
nvidia-cudnn-cu12                8.9.2.26
nvidia-cufft-cu12                11.0.2.54
nvidia-curand-cu12               10.3.2.106
nvidia-cusolver-cu12             11.4.5.107
nvidia-cusparse-cu12             12.1.0.106
nvidia-nccl-cu12                 2.18.1
nvidia-nvjitlink-cu12            12.3.101
nvidia-nvtx-cu12                 12.1.105
omegaconf                        2.3.0
onnx                             1.15.0
onnxruntime                      1.16.3
onnxruntime-extensions           0.9.0
openai                           1.10.0
opencv-python-headless           4.9.0.80
optimum                          1.16.2
packaging                        23.2
pandas                           2.2.0
pathvalidate                     3.2.0
peft                             0.7.1
pillow                           10.2.0
pip                              23.3.2
portalocker                      2.8.2
prettytable                      3.9.0
protobuf                         4.25.2
psutil                           5.9.8
py-cpuinfo                       9.0.0
pyarrow                          15.0.0
pyarrow-hotfix                   0.6
pybind11                         2.11.1
pycocotools                      2.0.7
pycountry                        23.12.11
pydantic                         2.5.3
pydantic_core                    2.14.6
pyparsing                        3.1.1
pytablewriter                    1.2.0
python-dateutil                  2.8.2
pytz                             2023.4
PyYAML                           6.0.1
regex                            2023.12.25
requests                         2.31.0
responses                        0.18.0
rouge-score                      0.1.2
sacrebleu                        1.5.0
safetensors                      0.4.2
schema                           0.7.5
scikit-learn                     1.4.0
scipy                            1.12.0
sentencepiece                    0.1.99
setuptools                       58.1.0
six                              1.16.0
sniffio                          1.3.0
sqlitedict                       2.1.0
sympy                            1.12
tabledata                        1.3.3
tcolorpy                         0.1.4
threadpoolctl                    3.2.0
tokenizers                       0.14.1
torch                            2.1.2
tqdm                             4.66.1
tqdm-multiprocess                0.0.11
transformers                     4.34.1
triton                           2.1.0
typepy                           1.3.2
typing_extensions                4.9.0
tzdata                           2023.4
urllib3                          2.1.0
wcwidth                          0.2.13
wrapt                            1.16.0
xxhash                           3.4.1
yarl                             1.9.4
zipp                             3.17.0
zstandard                        0.22.0
@wangyems
Copy link

@kunal-vaishnavi Could you help take a look?

@kunal-vaishnavi
Copy link
Contributor

Can you upgrade to the latest nightly ORT build and try again? Another bug was fixed in this PR.

@igaspard
Copy link
Author

It works, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants