Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train using distributed mode? #19

Open
handyzeng opened this issue Mar 30, 2023 · 0 comments
Open

How to train using distributed mode? #19

handyzeng opened this issue Mar 30, 2023 · 0 comments

Comments

@handyzeng
Copy link

When I want to train it using dist_train.sh tool, I got errors as flowing:

with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "./tools/train.py", line 108, in
main()
File "./tools/train.py", line 104, in main
logger=logger)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/apis/train.py", line 62, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/apis/train.py", line 256, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/sas_data/e01163/C-HOI/workspace/mmcv/mmcv/runner/runner.py", line 368, in run
epoch_runner(data_loaders[i], **kwargs)
File "/sas_data/e01163/C-HOI/workspace/mmcv/mmcv/runner/runner.py", line 267, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/apis/train.py", line 41, in batch_processor
losses = model(**data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 888, in forward
output = self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/models/detectors/base.py", line 95, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/models/detectors/cascade_rcnn_rel.py", line 414, in forward_train
x = self.extract_feat(img)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/models/detectors/cascade_rcnn_rel.py", line 165, in extract_feat
x = self.backbone(img)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sas_data/e01163/C-HOI/workspace/C-HOI/mmdet/models/backbones/resnet.py", line 506, in forward
x = self.conv1(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d() received an invalid combination of arguments - got (DataContainer, Parameter, NoneType, tuple, tuple, tuple, int), but expected one of:

  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
    didn't match because some of the arguments have invalid types: (DataContainer, Parameter, NoneType, tuple, tuple, tuple, int)
  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
    didn't match because some of the arguments have invalid types: (DataContainer, Parameter, NoneType, tuple, tuple, tuple, int)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant