Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Out of memory for Inference of Faster R-CNN, 11G 1080 Ti #821

Closed
PumayHui opened this issue Feb 26, 2019 · 4 comments
Closed

Out of memory for Inference of Faster R-CNN, 11G 1080 Ti #821

PumayHui opened this issue Feb 26, 2019 · 4 comments

Comments

@PumayHui
Copy link

PumayHui commented Feb 26, 2019

Expected results

Output bounding box results in images.

Actual results

[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 204: Original python traceback for operator 157 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 209: File "tools/infer_simple.py", line 209, in
WARNING workspace.py: 209: File "tools/infer_simple.py", line 135, in main
WARNING workspace.py: 209: File "/home/Detectron/detectron/core/test_engine.py", line 329, in initialize_model_from_cfg
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 112, in add_ResNet_convX_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 85, in add_stage
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 183, in add_residual_block
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 331, in bottleneck_transformation
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/detector.py", line 407, in ConvAffine
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_simple.py", line 209, in
main(args)
File "tools/infer_simple.py", line 158, in main
model, im, None, timers=timers
File "/home/Detectron/detectron/core/test.py", line 63, in im_detect_all
scores, boxes, im_scale = im_detect_bbox_aug(model, im, box_proposals)
File "/home/Detectron/detectron/core/test.py", line 238, in im_detect_bbox_aug
model, im, scale, max_size, box_proposals
File "/home/Detectron/detectron/core/test.py", line 333, in im_detect_bbox_scale
model, im, target_scale, target_max_size, boxes=box_proposals
File "/home/Detectron/detectron/core/test.py", line 160, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const
, int, char const
, std::string const&, void const
) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)

Detailed steps to reproduce

python infer_simple.py ……

When I run infer_simple.py with TEST.BBOX_AUG, the GPU memory has been rising, and finally I will out of memory.
Do you have this problem?
I tested it and found that this will not happen when TEST.BBOX_AUG is set to False. When set to True, running out about forty or fifty pictures will appear out of memory...

System information

  • Operating system: Ubuntu 18.04 LTS
  • Compiler version: gcc version 7.3.0
  • CUDA version: 10.0.130
  • cuDNN version: 7.4.2
  • NVIDIA driver version: 410.79
  • GPU models (for all devices if they are not all the same): model/ImageNetPretrained/MSRA/R-101.pkl
  • PYTHONPATH environment variable: anaconda3/bin/python
  • python --version output: 2.7
  • Anything else that seems relevant: GTX 1080Ti, 11G
@PumayHui PumayHui reopened this Feb 26, 2019
@TanLingxiao
Copy link

I think you haved errror build pytorch/caffe2 ,please test this before.

@kingman1980
Copy link

change the value of TEST: MAX_SIZE to a small value

@mine114
Copy link

mine114 commented Mar 18, 2019

hello,I have the same problem.have you solved this problem?

@doublex
Copy link

doublex commented May 15, 2019

Same problem. Similar GPU

@PumayHui
Copy link
Author

The reason is all of training images not the same size...
When I modified dataset as the same size, such as 1024 * 1024, this problem is solved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants