Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Responses for Llama-2-7b-hf with RTN/GPTQ INT4 Asymmetric Quantization #381

Closed
VishalX opened this issue Feb 7, 2024 · 2 comments

Comments

@VishalX
Copy link

VishalX commented Feb 7, 2024

I am trying to quantize the Llama-2-7b-hf model using the example here.

I was able to successfully generate the int4 model with GPTQ quantization by running below command.

python main.py --model_input .\llama2-7b-fp32\ --model_output .\Llama-2-7b-hf-gptq-asym --accuracy_level 0 --quantize --algorithm GPTQ

Settings

Namespace(model_input='.\\llama2-7b-fp32\\', model_output='.\\Llama-2-7b-hf-gptq-asym', benchmark=False, quantize=True, batch_size=1, workspace='nc_workspace', algorithm='GPTQ', pad_max=196, seqlen=2048, tasks=['winogrande', 'copa', 'piqa', 'rte', 'hellaswag', 'openbookqa', 'lambada_openai', 'lambada_standard', 'wikitext'], dataset='NeelNanda/pile-10k', block_size=32, is_symmetric=False, accuracy_level=0, sampling_size=8)

I have used the inference code from here with some changes mentioned below

use_fp16 = False  # True when KV cache inputs/outputs are in float16
use_buffer_share = False  # True when --use_gqa was passed during export
device = torch.device("cpu")  # running on CPU

However, when I try to run on CPU, I get garbage results for any prompt.

- Prompt: ONNX Runtime is
- Response: ONNX Runtime is  prisoner categorieпута Clientública одногоúblicaública одногоúblicaúblicaúblicapplyúblicaúblicaúblicaúblicaúblicaúblicaúblicażeública geometricúblicażeúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaுúblicaúblicaúblicaże zou[ întRunública Stim cruelF

- Prompt: I want to book a vacation to Hawaii. First, I need to
- Response: I want to book a vacation to Hawaii. First, I need to Statusifier liesStatusifierDOCTYPEissenschaft schedulecmpyed optyed optultan")yed opt diferenелісляcompos into")ultan intoultan optultan \( into oderifierultan rappresentultanел diferenyedyedམła intoyed into")cloudflareел

- Prompt: A good workout routine is
- Response: A good workout routine is 今设 gewesen gewesenісляwardwardwardward musical pueblo gewesen gewesen gewesen gewesenove gewesenoveісля instant zouwardxisісляwardісля instantoveRemoteісля gewesen только estaven толькоxis instantіслярия Wahl только zou서іслярияottiottiaba

- Prompt: How are astronauts launched into space?
- Response: How are astronauts launched into space? emarkemarkemark기 Wahl------+ел기ел기기yed finsелeringелłyyed finsyedелел기othy기 fatyed기temperaturen기기temperaturen thouісляtemperaturen기othy기yed Agutemperaturenелелел thouелinental

Similar output is observed with RTN Asymmetric INT4 model as well.
I'm using ORT-1.17.0 (latest) release, Windows 11, Python-3.9. Package details below:

# Name                    Version                   Build  Channel
aiohttp                   3.9.3                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
ca-certificates           2023.12.12           haa95532_0
cerberus                  1.3.5                    pypi_0    pypi
certifi                   2024.2.2                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
cmake                     3.27.0                   pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
contextlib2               21.6.0                   pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
datasets                  2.16.1                   pypi_0    pypi
deprecated                1.2.14                   pypi_0    pypi
dill                      0.3.7                    pypi_0    pypi
eigen                     3.4.0                h59b6b97_0
exceptiongroup            1.2.0                    pypi_0    pypi
filelock                  3.13.1                   pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
fmt                       9.1.0                h6d14046_0
fonttools                 4.47.2                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2023.10.0                pypi_0    pypi
glog                      0.3.1                    pypi_0    pypi
huggingface-hub           0.20.3                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
idna                      3.6                      pypi_0    pypi
importlib-metadata        7.0.1                    pypi_0    pypi
importlib-resources       6.1.1                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.8.2                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
multiprocess              0.70.15                  pypi_0    pypi
networkx                  3.2.1                    pypi_0    pypi
neural-compressor         2.4.1                    pypi_0    pypi
numpy                     1.26.3                   pypi_0    pypi
onnx                      1.15.0                   pypi_0    pypi
onnxruntime               1.17.0                   pypi_0    pypi
opencv-python-headless    4.9.0.80                 pypi_0    pypi
openssl                   1.1.1w               h2bbff1b_0
optimum                   1.16.2                   pypi_0    pypi
packaging                 23.2                     pypi_0    pypi
pandas                    2.2.0                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1           py39haa95532_0
pluggy                    1.4.0                    pypi_0    pypi
prettytable               3.9.0                    pypi_0    pypi
protobuf                  4.25.2                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
pyarrow                   15.0.0                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pycocotools               2.0.7                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pyreadline3               3.4.1                    pypi_0    pypi
pytest                    8.0.0                    pypi_0    pypi
python                    3.9.0                h6244533_2
python-dateutil           2.8.2                    pypi_0    pypi
python-gflags             3.1.2                    pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
safetensors               0.4.2                    pypi_0    pypi
schema                    0.7.5                    pypi_0    pypi
scikit-learn              1.4.0                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.2.2           py39haa95532_0
six                       1.16.0                   pypi_0    pypi
spdlog                    1.11.0               h59b6b97_0
sqlite                    3.41.2               h2bbff1b_0
sympy                     1.12                     pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tokenizers                0.15.1                   pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
torch                     2.2.0                    pypi_0    pypi
torchaudio                2.2.0                    pypi_0    pypi
torchvision               0.17.0                   pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
transformers              4.37.2                   pypi_0    pypi
typing-extensions         4.9.0                    pypi_0    pypi
tzdata                    2023.4                   pypi_0    pypi
urllib3                   2.2.0                    pypi_0    pypi
vc                        14.2                 h21ff451_1
voe                       0.1.0                    pypi_0    pypi
vs2015_runtime            14.27.29016          h5e58377_2
wcwidth                   0.2.13                   pypi_0    pypi
wheel                     0.41.2           py39haa95532_0
wrapt                     1.16.0                   pypi_0    pypi
xxhash                    3.4.1                    pypi_0    pypi
yarl                      1.9.4                    pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi

Can you pls investigate?

@VishalX
Copy link
Author

VishalX commented Feb 22, 2024

Issue in ONNX Runtime: microsoft/onnxruntime#19450

@VishalX
Copy link
Author

VishalX commented Feb 26, 2024

Fixed in ORT.

@VishalX VishalX closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant