Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: probability tensor contains either inf, nan or element < 0 #539

Open
2 tasks done
RobinRush opened this issue Sep 4, 2024 · 6 comments
Open
2 tasks done
Assignees

Comments

@RobinRush
Copy link

RobinRush commented Sep 4, 2024

System Info / 系統信息

cuda: 12.2(安装的pytorch是cuda-12.1的)
transformers:4.44.0
python:3.10
OS:kylinV10
显卡是:NVIDIA A100-SXM4-40GB(隔壁有问是不是这个问题,所以一并列出:QwenLM/Qwen2-VL#44

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1、我下载了huggingface上面的model
2、通过U盘传到了内网中。确认了md5值,没有修改过。
3、按照requirements.txt中的安装了一遍。
4、修改trans_cli_demo.py中模型路径THUDM/glm-4-9b-chat为/root/glm-4-9b-chat(本地的,也显示加载了)
5、运行trans_cli_demo.py

显示
RuntimeError: probability tensor contains either inf, nan or element < 0

Expected behavior / 期待表现

发送hi,能回复一句正常对话。

@RobinRush
Copy link
Author

@zRzRzRzRzRzRzR 我想起来了,或许您能帮我。

@RobinRush
Copy link
Author

@zhipuch 还有您

@sixsixcoder sixsixcoder self-assigned this Sep 4, 2024
@sixsixcoder
Copy link
Collaborator

请删除trans_cli_demo.py中的do_sample=True和temperature试一下

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Sep 4, 2024

想确定你是否开启了BF16推理,另外,能复现上述错误吗,我们更好定位

@RobinRush
Copy link
Author

请删除trans_cli_demo.py中的do_sample=True和temperature试一下

试了,不行。日志,我明天上班去内网里看看。

@RobinRush
Copy link
Author

想确定你是否开启了BF16推理,另外,能复现上述错误吗,我们更好定位

用了,也试了auto。可以复现。我明天去录个屏,截几张图。先感谢您

@wuhaoyu010
Copy link

近2日用A800的卡,多卡多用户并发流式输出一直遇到这个报错,将torch_dtype=torch.float32或torch_dtype=torch.float64 错误就消失了
看的下面的帖子,大概是A系列显卡对BF16 F16支持的不太好
yangjianxin1/Firefly#272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants