Skip to content

Qwen2.5-72b-instruct Repeating Outputs #988

Answered by jklj077
thiner asked this question in Q&A
Discussion options

You must be logged in to vote

如果是说该停的时候没停,然后后面都是重复内容的话,这个我们目前观察大概率是量化导致的。可以试试换AWQ,能缓解一些,原始精度模型目前看是正常的。

https://qwen.readthedocs.io/zh-cn/latest/quantization/gptq.html#qwen2-5-72b-instruct-gptq-int4-cannot-stop-generation-properly

另外,由于vLLM默认的采样超参并不会读取模型文件中的默认参数,这边也建议一般都加上:https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html#openai-compatible-api-service (并不针对该情况)

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@thiner
Comment options

@thiner
Comment options

Answer selected by thiner
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants