-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stephen-youn
requested review from
RezaYazdaniAminabadi,
jeffra,
mrwyattii,
awan-10,
cmikeh2,
arashb and
tjruwase
as code owners
October 3, 2023 22:52
tjruwase
reviewed
Oct 5, 2023
…/DeepSpeed into styoun/zero-inf-8bit-q
This reverts commit 2d34140.
@stephen-youn, can you please look into the failing CIs? |
tjruwase
reviewed
Oct 5, 2023
tjruwase
reviewed
Oct 5, 2023
(w/ and w/o the cuda kernels)
…/DeepSpeed into styoun/zero-inf-8bit-q
tjruwase
approved these changes
Oct 10, 2023
baodii
pushed a commit
to baodii/DeepSpeed
that referenced
this pull request
Nov 7, 2023
…ation in zero-inference (microsoft#4450) * kernels added for asym fine-grained block quantization with 8bits * formatting * clean up the code * rename quantize_int4.cu to quantize_intX.cu * rename test_int4_quantization.py to test_intX_quantization.py * "rename test_int4_quantization.py to test_intX_quantization.py" This reverts commit 2d34140. * rename * fix after the pr comments * increased coverage of QuantLinear test (w/ and w/o the cuda kernels) * formatting --------- Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
mauryaavinash95
pushed a commit
to mauryaavinash95/DeepSpeed
that referenced
this pull request
Feb 17, 2024
…ation in zero-inference (microsoft#4450) * kernels added for asym fine-grained block quantization with 8bits * formatting * clean up the code * rename quantize_int4.cu to quantize_intX.cu * rename test_int4_quantization.py to test_intX_quantization.py * "rename test_int4_quantization.py to test_intX_quantization.py" This reverts commit 2d34140. * rename * fix after the pr comments * increased coverage of QuantLinear test (w/ and w/o the cuda kernels) * formatting --------- Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds 8bit dequantization kernels that takes similar approach with the 4bit kernels that are already in the repo. (this was added while assessing the gemm + dequantization performance in the repo, 8bit kernels were missing but it's not difficult to add one here with this PR).