You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have used the inference code from here with some changes mentioned below
use_fp16=False# True when KV cache inputs/outputs are in float16use_buffer_share=False# True when --use_gqa was passed during exportdevice=torch.device("cpu") # running on CPU
However, when I try to run on CPU, I get garbage results for any prompt.
- Prompt: ONNX Runtime is
- Response: ONNX Runtime is prisoner categorieпута Clientública одногоúblicaública одногоúblicaúblicaúblicapplyúblicaúblicaúblicaúblicaúblicaúblicaúblicażeública geometricúblicażeúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaúblicaுúblicaúblicaúblicaże zou[ întRunública Stim cruelF
- Prompt: I want to book a vacation to Hawaii. First, I need to
- Response: I want to book a vacation to Hawaii. First, I need to Statusifier liesStatusifierDOCTYPEissenschaft schedulecmpyed optyed optultan")yed opt diferenелісляcompos into")ultan intoultan optultan \( into oderifierultan rappresentultanел diferenyedyedམła intoyed into")cloudflareел
- Prompt: A good workout routine is
- Response: A good workout routine is 今设 gewesen gewesenісляwardwardwardward musical pueblo gewesen gewesen gewesen gewesenove gewesenoveісля instant zouwardxisісляwardісля instantoveRemoteісля gewesen только estaven толькоxis instantіслярия Wahl только zou서іслярияottiottiaba
- Prompt: How are astronauts launched into space?
- Response: How are astronauts launched into space? emarkemarkemark기 Wahl------+ел기ел기기yed finsелeringелłyyed finsyedелел기othy기 fatyed기temperaturen기기temperaturen thouісляtemperaturen기othy기yed Agutemperaturenелелел thouелinental
Similar output is observed with RTN Asymmetric INT4 model as well.
I'm using ORT-1.17.0 (latest) release, Windows 11, Python-3.9. Package details below:
I am trying to quantize the
Llama-2-7b-hf
model using the example here.I was able to successfully generate the
int4
model with GPTQ quantization by running below command.Settings
I have used the inference code from here with some changes mentioned below
However, when I try to run on CPU, I get garbage results for any prompt.
Similar output is observed with RTN Asymmetric INT4 model as well.
I'm using ORT-1.17.0 (latest) release, Windows 11, Python-3.9. Package details below:
Can you pls investigate?
The text was updated successfully, but these errors were encountered: