Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

Merged
merged 15 commits into from
Oct 11, 2023

Conversation

stephen-youn
Copy link
Contributor

@stephen-youn stephen-youn commented Oct 3, 2023

This PR adds 8bit dequantization kernels that takes similar approach with the 4bit kernels that are already in the repo. (this was added while assessing the gemm + dequantization performance in the repo, 8bit kernels were missing but it's not difficult to add one here with this PR).

@tjruwase
Copy link
Contributor

tjruwase commented Oct 5, 2023

@stephen-youn, can you please look into the failing CIs?

@stephen-youn stephen-youn reopened this Oct 6, 2023
@tjruwase tjruwase added this pull request to the merge queue Oct 11, 2023
Merged via the queue into master with commit 6c86ff3 Oct 11, 2023
15 checks passed
baodii pushed a commit to baodii/DeepSpeed that referenced this pull request Nov 7, 2023
…ation in zero-inference (microsoft#4450)

* kernels added for asym fine-grained block quantization with 8bits

* formatting

* clean up the code

* rename quantize_int4.cu to quantize_intX.cu

* rename test_int4_quantization.py to test_intX_quantization.py

* "rename test_int4_quantization.py to test_intX_quantization.py"

This reverts commit 2d34140.

* rename

* fix after the pr comments

* increased coverage of QuantLinear test
(w/ and w/o the cuda kernels)

* formatting

---------

Co-authored-by: Stephen Youn <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024
…ation in zero-inference (microsoft#4450)

* kernels added for asym fine-grained block quantization with 8bits

* formatting

* clean up the code

* rename quantize_int4.cu to quantize_intX.cu

* rename test_int4_quantization.py to test_intX_quantization.py

* "rename test_int4_quantization.py to test_intX_quantization.py"

This reverts commit 2d34140.

* rename

* fix after the pr comments

* increased coverage of QuantLinear test
(w/ and w/o the cuda kernels)

* formatting

---------

Co-authored-by: Stephen Youn <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants