adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

stephen-youn · 2023-10-03T22:52:51Z

This PR adds 8bit dequantization kernels that takes similar approach with the 4bit kernels that are already in the repo. (this was added while assessing the gemm + dequantization performance in the repo, 8bit kernels were missing but it's not difficult to add one here with this PR).

csrc/quantization/quantize_int4.cu

…/DeepSpeed into styoun/zero-inf-8bit-q

This reverts commit 2d34140.

tjruwase · 2023-10-05T20:29:55Z

@stephen-youn, can you please look into the failing CIs?

deepspeed/inference/quantization/utils.py

tests/unit/inference/quantization/test_intX_quantization.py

(w/ and w/o the cuda kernels)

…/DeepSpeed into styoun/zero-inf-8bit-q

…ation in zero-inference (microsoft#4450) * kernels added for asym fine-grained block quantization with 8bits * formatting * clean up the code * rename quantize_int4.cu to quantize_intX.cu * rename test_int4_quantization.py to test_intX_quantization.py * "rename test_int4_quantization.py to test_intX_quantization.py" This reverts commit 2d34140. * rename * fix after the pr comments * increased coverage of QuantLinear test (w/ and w/o the cuda kernels) * formatting --------- Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

styoun added 2 commits October 3, 2023 22:44

kernels added for asym fine-grained block quantization with 8bits

a9328b9

formatting

347dea0

stephen-youn requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, cmikeh2, arashb and tjruwase as code owners October 3, 2023 22:52

styoun and others added 2 commits October 3, 2023 22:54

clean up the code

a373488

Merge branch 'master' into styoun/zero-inf-8bit-q

561cd16

tjruwase reviewed Oct 5, 2023

View reviewed changes

csrc/quantization/quantize_int4.cu Show resolved Hide resolved

styoun added 5 commits October 5, 2023 18:42

rename quantize_int4.cu to quantize_intX.cu

e0ad167

Merge branch 'styoun/zero-inf-8bit-q' of https://github.com/microsoft…

90858e7

…/DeepSpeed into styoun/zero-inf-8bit-q

rename test_int4_quantization.py to test_intX_quantization.py

2d34140

"rename test_int4_quantization.py to test_intX_quantization.py"

b9896cd

This reverts commit 2d34140.

rename

359c3e5

tjruwase reviewed Oct 5, 2023

View reviewed changes

deepspeed/inference/quantization/utils.py Show resolved Hide resolved

tjruwase reviewed Oct 5, 2023

View reviewed changes

tests/unit/inference/quantization/test_intX_quantization.py Outdated Show resolved Hide resolved

fix after the pr comments

1ea4bf4

stephen-youn closed this Oct 6, 2023

stephen-youn reopened this Oct 6, 2023

styoun and others added 5 commits October 6, 2023 22:59

increased coverage of QuantLinear test

0b21637

(w/ and w/o the cuda kernels)

Merge branch 'master' into styoun/zero-inf-8bit-q

571ab83

formatting

68c258c

Merge branch 'styoun/zero-inf-8bit-q' of https://github.com/microsoft…

7c7310d

…/DeepSpeed into styoun/zero-inf-8bit-q

Merge branch 'master' into styoun/zero-inf-8bit-q

c970434

tjruwase approved these changes Oct 10, 2023

View reviewed changes

tjruwase added this pull request to the merge queue Oct 11, 2023

Merged via the queue into master with commit 6c86ff3 Oct 11, 2023
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

stephen-youn commented Oct 3, 2023 •

edited

Loading

tjruwase commented Oct 5, 2023

adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference #4450

Conversation

stephen-youn commented Oct 3, 2023 • edited Loading

tjruwase commented Oct 5, 2023

stephen-youn commented Oct 3, 2023 •

edited

Loading