Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619
Azure Pipelines / orttraining-mac-ci-pipeline (MacOS_C_API_Package_Publish MacOS_C_API_Package_Publish)
succeeded
Feb 29, 2024 in 13s
MacOS_C_API_Package_Publish MacOS_C_API_Package_Publish succeeded
Loading