Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: copy_if failed to synchronize: device-side assert triggered #19

Open
donglin8506 opened this issue Oct 11, 2019 · 1 comment

Comments

@donglin8506
Copy link

@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?

Traceback (most recent call last):
  File "tools/train_net.py", line 186, in <module>
    main()
  File "tools/train_net.py", line 179, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 85, in train
    arguments,
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
    loss_dict = model(images, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 204, in new_fwd
    **applier(kwargs, input_caster))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 207, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 223, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
Traceback (most recent call last):
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 238, in <module>
    main()
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 234, in main
    cmd=process.args)
@congjianting
Copy link

检查下训练数据, 包括类别,和坐标位置.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants