Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error' #9

Lime-Cakes · 2022-12-07T19:12:38Z

Is there an explanation of this error?

/usr/local/lib/python3.8/dist-packages/poptorch/experimental.py in __exit__(self, exc_type, value, traceback)
    253         if self._compile_using == enums.Compiler.PopART:
    254             # Compile the captured graph using PopART.
--> 255             self._executable = poptorch_core.compileWithManualTracing(
    256                 self._options.toDict(), accessAttributes, self._training,
    257                 self._dict_optimizer,

Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error': /opt/jenkins/workspace/poplar/poplar_ci_ubuntu_20_04_unprivileged/popart/willow/src/popx/irlowering.cpp:3516 Out of memory. Single stream of length 149766584 in bufferIndex 1
Error raised in:
  [0] popart::Session::prepareDevice: Poplar compilation
  [1] Compiler::compileAndPrepareDevice
  [2] LowerToPopart::compile
  [3] compileWithManualTracing

The text was updated successfully, but these errors were encountered:

ariannas-graphcore · 2022-12-09T11:25:10Z

Hi,

the exception you're hitting is thrown when the allocation of stream buffer fails: in this case, too much memory is being used by a data stream (Single stream of length 149766584 as per the error), which can happen e.g. when having too large outputs being streamed back from the IPU to the host.

It would be great if you could share instructions to reproduce the error and SDK version you've been using: this would enable us to to share specific remedies to the issue. In the meanwhile, here are some more generic suggestions that might help:

Prefetching allows the host to prepare the next buffer for the infeed/outfeed while it waits for the IPU to compute the current buffer. You could try to reduce the buffering depth of the infeeds/outfeeds in PopART. If your program is using buffering depth greater than 1, this might improve performance but comes at the cost of increasing the memory footprint. You could alternatively try to disable prefetching, which is enabled by default. You can find more details on how to do that in our PopART user guide.
Try to switch to a different poptorch.OutputMode() e.g. if you're using All, you might want to try to use Final instead. This mode only returns the last batch instead of returning a result for each batch, which would decrease the accumulated amount of output data used. See PopTorch user guide for more details.
Try to reduce the batch size and/or the PopTorch deviceIterations - see PopTorch user guide.

Hope this helps for now, I encourage you to share a reproducer for us to investigate this further.

Best,
Arianna

Lime-Cakes · 2022-12-13T20:40:37Z

This is for training, so I left outputMode at default, which should already be last. What values would be streamed from IPU to host when training? The loss calculation would be done on the IPU since it's part of the model.

Could the stream error be related to data streaming between IPUs?

payoto · 2023-01-27T13:53:41Z

No, those streams are for IPU/host communication. Have you had a chance to try any of the other suggestions @ariannas-graphcore provided?

Lime-Cakes · 2023-01-28T16:02:48Z

Yeah. Those either doesn't work, or can't be used (lower further)

payoto · 2023-01-30T12:29:20Z

By any chance are you trying to profile the Poplar executable by generating a PopVision profile? (by setting the environment variable: POPLAR_ENGINE_OPTIONS='{"autoReport.all"="true"}')
I have seen that occasionally cause host stream issues in the past.

We can try and provide more specific support but we would need additional information on the system, software and model you are encountering the error in:

IPU Machine type (number of IPUs, Paperspace? GCore?),
Poplar SDK version
Environment variables that are set?
Frameworks (Pytorch, huggingface, Tensorflow?)
Which model and datasets you are trying to run (if possible)?

Ideally if you can send us a code sample which reproduces the error we can provide more specific advice to help fix the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error' #9

Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error' #9

Lime-Cakes commented Dec 7, 2022

ariannas-graphcore commented Dec 9, 2022

Lime-Cakes commented Dec 13, 2022

payoto commented Jan 27, 2023

Lime-Cakes commented Jan 28, 2023

payoto commented Jan 30, 2023

Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error' #9

Error: In include/poptorch_err/ExceptionHandling.hpp:76: 'poplar_stream_memory_allocation_error' #9

Comments

Lime-Cakes commented Dec 7, 2022

ariannas-graphcore commented Dec 9, 2022

Lime-Cakes commented Dec 13, 2022

payoto commented Jan 27, 2023

Lime-Cakes commented Jan 28, 2023

payoto commented Jan 30, 2023