Enabling S8S8 and S8U8 handling in QGemm for AVX2 and AVX-VNNI #21123
+110
−21
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Implementation of sign flipping in QGemm CopyPackA to enable S8S8 and S8U8 handling in AVX2 and AVX-VNNI.
Added dispatching for S8S8 and S8U8 variants, defaulting to C++ implementation if AVX2 is not present.
Added unit testing triggers for S8S8 and S8U8.
Motivation and Context
QGemm kernel expects data in U8S8 form to utilize AVX-VNNI dot product instructions and the corresponding performance benefits.
Existing code can sign-flip the B matrix from unsigned to signed to allow U8U8 data to use this U8S8 VNNI instruction.
This code enables sign flipping in the A matrix to also allow S8S8 and S8U8 models to be translated into U8S8 form and use the VNNI instructions.
This change will enable models of any int8 data format to be handled by onnxruntime and see the same performance benefits.