-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use FlashInfer RoPE #2016
base: main
Are you sure you want to change the base?
Conversation
It's worth noting that flashinfer uses fp32 internally for sin/cos, we found there will be some non-trivial output difference if we use fp16 sin/cos. |
2693a1f
to
a4dcf3c
Compare
…) with QKV dtypes
9a8f8fd
to
0015a72
Compare
flashinfer-ai/flashinfer#609 potentially introduces correctness issues
@james-p-xu How is it going? Has it already run successfully after removing this dependency using flashinfer latest https://github.com/flashinfer-ai/flashinfer-nightly/releases sglang/python/sglang/srt/models/llama.py Line 25 in 60769be
|
Motivation
NOTE:
flashinfer.apply_rope_pos_ids
does not exist in the prebuilt wheel, must build from source. Is this an issue?We want to verify the correctness of flashinfer's RoPE against vLLM's RoPE, in preparation of replacing vLLM's
get_rope
with flashinfer's.cc: @ByronHsu
Modifications
Added standalone python script for comparison.
Checklist