When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

leoyuppieqnew · 2024-09-09T04:02:53Z

No description provided.

robertgshaw2-neuralmagic · 2024-09-09T13:57:44Z

This is something we are actively working on supporting end-to-end.

In vllm, we currently support 2:4 sparsity with w4A16 and w8a16. We need to add inference kernels to support w8a8 fp8 with sparse 2:4. We are collaborating the cutlass teams on this.

leoyuppieqnew added the enhancement New feature or request label Sep 9, 2024

vllm-project deleted a comment from chaldipok3 Sep 22, 2024

vllm-project deleted a comment from bebeynen Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

leoyuppieqnew commented Sep 9, 2024

robertgshaw2-neuralmagic commented Sep 9, 2024

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

Comments

leoyuppieqnew commented Sep 9, 2024

robertgshaw2-neuralmagic commented Sep 9, 2024