Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use FlashInfer RoPE #2016

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

james-p-xu
Copy link
Contributor

@james-p-xu james-p-xu commented Nov 12, 2024

Motivation

NOTE: flashinfer.apply_rope_pos_ids does not exist in the prebuilt wheel, must build from source. Is this an issue?

We want to verify the correctness of flashinfer's RoPE against vLLM's RoPE, in preparation of replacing vLLM's get_rope with flashinfer's.

cc: @ByronHsu

Modifications

Added standalone python script for comparison.

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs requested a review from yzh119 November 12, 2024 17:47
@yzh119
Copy link
Collaborator

yzh119 commented Nov 12, 2024

It's worth noting that flashinfer uses fp32 internally for sin/cos, we found there will be some non-trivial output difference if we use fp16 sin/cos.

@james-p-xu james-p-xu changed the title Add RoPE comparison across flashinfer and vLLM [WIP] Use FlashInfer RoPE Nov 14, 2024
@zhyncs
Copy link
Member

zhyncs commented Nov 23, 2024

@james-p-xu How is it going? Has it already run successfully after removing this dependency using flashinfer latest https://github.com/flashinfer-ai/flashinfer-nightly/releases

from vllm.model_executor.layers.rotary_embedding import get_rope

ispobock pushed a commit to ispobock/sglang that referenced this pull request Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants