Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619
Azure Pipelines / Big Models (Whisper_ONNX Whisper_ONNX)
succeeded
Feb 29, 2024 in 15m 3s
Whisper_ONNX Whisper_ONNX succeeded
Loading