New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Generic "truncate_float" class for bf16 and fp16 quantization #3591

Open

richagadgil wants to merge 59 commits into develop from generic_quant_class

+1,467 −18

Open

Generic "truncate_float" class for bf16 and fp16 quantization #3591

Update quantization.cpp

Azure Pipelines / AMDMIGraphX (AMDMIGraphX_testing gfx942) failed Nov 8, 2024 in 20s

AMDMIGraphX_testing gfx942 failed

1 errors / 0 warnings

Annotations

Check failure on line 17 in Build log

azure-pipelines / AMDMIGraphX (AMDMIGraphX_testing gfx942)

Build log #L17

Bash exited with code '8'.

View more details on Azure Pipelines