We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在3090上部署了7B得chat对话模型,在推理时我发现模型速度为0.3ms左右但是后处理token得时候,每隔token得耗时达到了2s,导致响应速度非常慢,我发现是for循环在迭代调用GenerationMixin时耗时非常就,请问这个怎么解决?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我在3090上部署了7B得chat对话模型,在推理时我发现模型速度为0.3ms左右但是后处理token得时候,每隔token得耗时达到了2s,导致响应速度非常慢,我发现是for循环在迭代调用GenerationMixin时耗时非常就,请问这个怎么解决?
The text was updated successfully, but these errors were encountered: