Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

问题一:print(model)与forward中定义的model不一致,与文章中的结构不一致。问题二:且训练时,无法更新fc层梯度 #427

Open
githubsuperfans opened this issue May 9, 2024 · 0 comments

Comments

@githubsuperfans
Copy link

你好,打扰了,以下是我的问题。

训练:CUDA_VISIBLE_DEVICES=0,1,2,3 python train_dist.py --dataset minc --model deepten_resnet50_minc --batch-size 2 --lr 0.004 --epochs 80 --lr-step 60 --lr-scheduler step --weight-decay 5e-4

问题一:print(model)与forward中定义的model不一致,与文章中的结构不一致
print(model)比deepten.py中的define forward的结构多了全连接层,如下附件
print(model).txt
deepten.txt
而文章中这个结构似乎没有全连接层?
image
问题二:训练时,fc层梯度无法更新?
如下是我训练打印出的梯度
gra.txt

总的来说我不太清楚是否应该包含全连接层2048,1000。如果应该包含,我如何解决梯度更新的问题,如果不该包含,我该在哪里取删除它?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant