int8 slower than bf16 on A100 #2553
Labels
bug
Something isn't working
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
System Info
x86_64, debian 11, A100 GPU
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
int8 should be much faster than bf16
actual behavior
int8 takes 1.7s, bf16 takes 1.2s
additional notes
A100 has int8 tensor cores
The text was updated successfully, but these errors were encountered: