-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Deepspeed inference does not support the Qwen model #4840
Comments
I have the same error. here is my code:
cmd
error info:
ds_report output:
|
refer to #4913 Install DeepSpeed from the latest source code and consider utilizing DeepSpeed-MII for optimal performance. |
@ZonePG Thank you for your awesome work! I tried the following code(as jiahe7ay did):
and found that the assert still failed and the inference speed did not speed up. Is it because of Qwen-VL’s Vision head? How should I use deepspeed acceleration correctly? |
@rayquazaMega I have not used the VL model, I think it's not well-supported currently. For Chat or Base model, I would like recommend |
Describe the bug
I use deepspeed.init_inference to accelerate the inference of the Qwen model. When I compare it with not using deepspeed.init_inference, I find that there is no acceleration.
Then I assert whether the Qwen module is initialized as the DeepspeedTransformerInference class, but it is not initialized successfully.
I'm curious about one thing: the Qwen model is also a pure encoder architecture, similar to the GPT model. Why does the initialization fail?
To Reproduce
the code is :
Expected behavior
The inference of the Qwen model has been accelerated.
System info (please complete the following information):
Additional context
I think Qwen is a very popular large model, and I hope the official release its adaptation soon.
Qwen: https://github.com/QwenLM/Qwen
The text was updated successfully, but these errors were encountered: