Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLAS][AArch64] SQNBitGemm CompInt8 - Use 4x2 tiles #21380

Merged
merged 7 commits into from
Jul 18, 2024

Conversation

edgchen1
Copy link
Contributor

@edgchen1 edgchen1 commented Jul 17, 2024

Description

Update SQNBitGemm ARM NEON kernel to compute 4x2 tile of output.

Note: Also tried 2x4 and 4x4 tiles but observed the best microbenchmark results with 4x2 tiles.

Measurements

Baseline: 20cd339
Updated: aecb18a

Microbenchmarks

Run on Azure VM (ARM64 Linux) with compute type: CompInt8, number of threads: 4, M:128/K:4096/N:4096

blklen symmetric baseline time (ns) updated time (ns)
16 1 51511766 44803227
16 0 58870228 49002040
32 1 29887367 25812083
32 0 33208816 26632430
64 1 30344972 26624130
64 0 31460702 25966747
E2E test

Run onnxruntime-genai benchmark with Phi-3 mini using 4 threads.

machine baseline prompt processing tokens/second updated pp t/s
Samsung Galaxy S21 15.93 17.93
Surface Pro 9 33.36 36.60
Azure VM 18.92 21.84

Motivation and Context

Improve prompt processing (M>1) performance.

@edgchen1 edgchen1 marked this pull request as ready for review July 17, 2024 22:55
@edgchen1 edgchen1 requested a review from a team as a code owner July 17, 2024 22:55
Copy link
Contributor

@fajin-corp fajin-corp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@edgchen1 edgchen1 merged commit 05fc0c6 into main Jul 18, 2024
99 checks passed
@edgchen1 edgchen1 deleted the edgchen1/sqnbitgemm_larger_tiles branch July 18, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants