Neural Speed supports the following models:
Model Name | INT8 | INT4 | Transformer Version | Max tokens length | ||||||
---|---|---|---|---|---|---|---|---|---|---|
RTN | GPTQ | AWQ | AutoRound | RTN | GPTQ | AWQ | AutoRound | |||
Meta-Llama-3-8B-Instruct | ✅ | ✅ | ✅ | ✅ | Latest | 8192 | ||||
TinyLlama-1.1B, LLaMA2-tB, LLaMA2-13B, LLaMA2-70B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 4096 |
LLaMA-7B, LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 2048 |
CodeLlama-7b | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 16384 |
Solar-10.7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 4096 |
Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 32768 |
Mistral-7B, Mistral-7B-Instruct-v0.2, Mixtral-8x7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 4.36.0 or newer | 32768 |
Qwen-7B, Qwen-14B, Qwen1.5-7B, Qwen1.5-0.5B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 8192 / 32768 |
GPT-J-6B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 2048 |
GPT-NeoX-20B | ✅ | ✅ | Latest | 2048 | ||||||
Dolly-v2-3B | ✅ | ✅ | 4.28.1 or newer | 2048 | ||||||
MPT-7B, MPT-30B | ✅ | ✅ | Latest | 2048 | ||||||
Falcon-7B, Falcon-40B | ✅ | ✅ | Latest | 2048 | ||||||
BLOOM-7B | ✅ | ✅ | Latest | 2048 | ||||||
OPT-125m, OPT-1.3B, OPT-13B | ✅ | ✅ | Latest | 2048 | ||||||
ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B, GLM-4-9B | ✅ | ✅ | 4.33.1 | 2048 / 32768 | ||||||
Baichuan-13B-Chat,Baichuan2-13B-Chat,Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 4.33.1 | 4096 |
phi-2, phi-1_5 phi-1 | ✅ | ✅ | Latest | 2048 | ||||||
phi-3-128k, phi-3-48k | ✅ | ✅ | Latest | 128k | ||||||
StableLM-2-1_6B, StableLM-3B, StableLM-2-12B | ✅ | ✅ | Latest | 4096 | ||||||
gemma-2b-it , gemma-7b | ✅ | ✅ | Latest | 8192 | ||||||
Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large | ✅ | ✅ | Latest | 448 |
Model Name | INT8 | INT4 | Transformer Version | ||||||
---|---|---|---|---|---|---|---|---|---|
RTN | GPTQ | AWQ | AutoRound | RTN | GPTQ | AWQ | AutoRound | ||
Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
Magicoder-6.7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
StarCoder-1B, StarCoder-3B, StarCoder-15.5B | ✅ | ✅ | Latest | ||||||
Stable-Code-3B | ✅ | ✅ | Latest |
Model Name | |||||
---|---|---|---|---|---|
F32 | F16 | Q4_0 | Q8_0 | BTLA | |
TheBloke/Llama-2-7B-Chat-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/Mistral-7B-v0.1-GGUF, TheBloke/Mistral-7B-v0.2-GGUF, | ✅ | ✅ | ✅ | ✅ | |
TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/CodeLlama-7B-GGUF,TheBloke/CodeLlama-13B-GGUF | ✅ | ✅ | ✅ | ✅ | |
Qwen1.5-7B-Chat-GGUF | ✅ | ✅ | ✅ | ✅ | |
Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ |
meta-llama/Llama-2-7b-chat-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
upstage/SOLAR-10.7B-Instruct-v1.0 | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen-7B-Chat, Qwen1.5-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
tiiuae/falcon-7 | ✅ | ✅ | ✅ | ✅ | ✅ |
tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ |
mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ |
mpt-30b | ✅ | ✅ | ✅ | ✅ | ✅ |
bloomz-7b1 | ✅ | ✅ | ✅ | ✅ | ✅ |