-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
为什么我增加了训练数据集数量后,无法训练? #79
Comments
你框出来的就是报错原因 |
似乎没截全,不清楚 |
显存不够,我也是一样的问题 |
请问这个问题解决了吗?我设置batch_size=1时也会报同样的memory out错误,这儿会不会存在内存泄漏 |
对于这个问题,我遇到过,可以尝试用多卡训练,单张3090不行,qnrf和nwpu这两个数据集很大 |
P2PNet qq交流群:790745540,互相帮助,解决bug问题 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我自己的数据集中,训练集1807张,测试集1800张,放在了sence01目录下。当我使用默认的训练参数训练时,能够正常训练。
当我在同级目录下增加了sence02,训练集1201张,测试集334张,依然采用默认的训练参数训练时,就报错了。报错信息如下:
训练命令如下:
服务器环境:
torch-gpu: 2.0.1
gpu: 3090
还有一个问题,当我把batch size 从8修改为256、128、64、32、16等任意一个batch size, 都不能正常训练,具体表现为当bs为256、128、64、32时,直接不能训练,当bs为16时,训练2轮后,就报上图中的错误。
请大佬帮忙答疑解惑,万分感谢!!
The text was updated successfully, but these errors were encountered: