Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize int4 gemv kernel with cuda #18818

Merged
merged 6 commits into from
Dec 22, 2023
Merged

optimize int4 gemv kernel with cuda #18818

merged 6 commits into from
Dec 22, 2023

Conversation

yufenglee
Copy link
Member

@yufenglee yufenglee commented Dec 14, 2023

Description

optimize gemv kernel:

  1. unroll reduction to improve memory bandwidth
  2. leverage 4bits to float16 tricks to save instrutions
m n k symmetric latency before(us) latency after(us)
1 4096 4096 TRUE 15.54 8.82
1 4096 4096 FALSE 15.84 9.89
1 4096 11008 TRUE 42.44 19.4
1 4096 11008 FALSE 44.42 21.48
1 11008 4096 TRUE 34.65 17.46
1 11008 4096 FALSE 35.76 20.87
1 12288 4096 TRUE 39.27 19.73
1 12288 4096 FALSE 40.91 25.2
1 22016 4096 TRUE 65.78 38.81
1 22016 4096 FALSE 67.98 48.36

@yufenglee yufenglee force-pushed the yufeng/int4_gemv_gpu_opt branch from 8654f3e to 1c4e22f Compare December 19, 2023 19:50
chenfucn
chenfucn previously approved these changes Dec 21, 2023
@yufenglee yufenglee merged commit 985acda into main Dec 22, 2023
92 of 100 checks passed
@yufenglee yufenglee deleted the yufeng/int4_gemv_gpu_opt branch December 22, 2023 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants