Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training bottleneck #45

Open
tpankaj opened this issue Jul 30, 2017 · 0 comments
Open

Training bottleneck #45

tpankaj opened this issue Jul 30, 2017 · 0 comments

Comments

@tpankaj
Copy link
Contributor

tpankaj commented Jul 30, 2017

Large bottlenecks when training. I trained for 12 hours with cProfile running and got this as the top cumulative time functions. It shows that Batch.fill takes up 85% of training time!

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002 39087.736 39087.736 Train.py:1(<module>)
        1    3.653    3.653 39086.782 39086.782 Train.py:16(main)
   145058    1.010    0.000 35739.204    0.246 Train.py:44(run_net)
   145058  129.018    0.001 33142.496    0.228 Batch.py:34(fill)
 10049055   21.451    0.000 22053.800    0.002 Data.py:39(get_data)
 10049055 1976.519    0.000 22032.350    0.002 Segment_Data.py:275(get_data)
 64993847 9822.780    0.000 15071.341    0.000 dataset.py:397(__getitem__)
  9283661  755.755    0.000 10920.047    0.001 Batch.py:49(data_into_batch)
114014974/112709452 5766.228    0.000 5781.245    0.000 {torch._C.cat}
129980296 2802.223    0.000 4964.670    0.000 group.py:160(__getitem__)
   145057    2.188    0.000 3308.168    0.023 Batch.py:111(backward)
 65565909  579.216    0.000 3170.871    0.000 _utils.py:37(_cuda)
   145057  261.668    0.002 2595.698    0.018 Batch.py:97(forward)
 64993847  277.916    0.000 2158.638    0.000 selections.py:27(select)
   145057   12.529    0.000 1901.331    0.013 clip_grad.py:2(clip_grad_norm)
  7542964 1797.248    0.000 1797.248    0.000 {method 'norm' of 'torch._C.CudaFloatTensorBase' objects}
 65420851 1795.412    0.000 1795.412    0.000 {method 'copy_' of 'torch._C.CudaFloatTensorBase' objects}
 64993847 1425.956    0.000 1740.737    0.000 dataset.py:313(__init__)
 56137097  229.232    0.000 1557.593    0.000 __init__.py:267(type)
259975388 1315.414    0.000 1348.605    0.000 dataset.py:217(shape)
 64993847  392.916    0.000 1277.835    0.000 selections.py:250(__getitem__)
10009001/290115   39.002    0.000 1142.344    0.004 module.py:205(__call__)
 56137099  285.072    0.000 1116.109    0.000 _utils.py:5(_type)
   145058    2.763    0.000 1107.541    0.008 SqueezeNet.py:76(forward)
   435174    4.181    0.000 1074.084    0.002 container.py:62(forward)
 27850982   50.400    0.000 1012.754    0.000 tensor.py:37(float)
  1160464   13.646    0.000  965.766    0.001 SqueezeNet.py:25(forward)
 64993847  125.658    0.000  907.826    0.000 fromnumeric.py:1837(product)
   145057    1.007    0.000  895.441    0.006 variable.py:116(backward)
   145057  892.171    0.006  892.171    0.006 {method 'run_backward' of 'torch._C._EngineBase' objects}
 74279245  829.433    0.000  829.433    0.000 {method 'reduce' of 'numpy.ufunc' objects}
 64993847  288.611    0.000  794.311    0.000 selections.py:429(_handle_simple)
 18567321  724.864    0.000  724.864    0.000 {method 'copy_' of 'torch._C.CudaDoubleTensorBase' objects}
  1305522    8.819    0.000  622.831    0.000 variable.py:839(cat)
 27850944   30.437    0.000  619.597    0.000 tensor.py:29(cpu)
  1305522    3.550    0.000  612.082    0.000 tensor.py:308(forward)
 64993847  186.837    0.000  539.964    0.000 selections.py:244(__init__)
   145057   70.703    0.000  509.208    0.004 adadelta.py:27(step)
 64993847  320.490    0.000  325.345    0.000 selections.py:147(__init__)
 27850944  311.320    0.000  311.320    0.000 {method 'copy_' of 'torch._C.FloatTensorBase' objects}
1281151179/1281151177  261.754    0.000  288.256    0.000 {isinstance}
 64993847  249.225    0.000  281.225    0.000 filters.py:207(get_filters)
  9283661  279.629    0.000  279.629    0.000 {method 'copy_' of 'torch._C.CudaByteTensorBase' objects}
 64993847  177.023    0.000  269.627    0.000 selections.py:406(_expand_ellipsis)
 64993847  268.112    0.000  268.112    0.000 base.py:81(is_empty_dataspace)
 27850981   20.463    0.000  259.458    0.000 tensor.py:312(__div__)
 27850981  238.995    0.000  238.995    0.000 {method 'div' of 'torch._C.CudaFloatTensorBase' objects}
 18567374  220.069    0.000  220.069    0.000 {method 'zero_' of 'torch._C.FloatTensorBase' objects}
  9283648    6.272    0.000  208.434    0.000 {method 'mean' of 'numpy.ndarray' objects}
 18567309   13.199    0.000  205.147    0.000 tensor.py:273(__sub__)
  3771508   11.772    0.000  202.183    0.000 conv.py:235(forward)
  9283648  101.293    0.000  202.162    0.000 _methods.py:53(_mean)
 64986860  159.096    0.000  192.368    0.000 group.py:36(__init__)
 18567309  191.948    0.000  191.948    0.000 {method 'sub' of 'torch._C.CudaFloatTensorBase' objects}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants