Memory leak when running GPU broadcast in a loop #327

tiemvanderdeure · 2023-06-01T12:49:45Z

I need to run GPU operations inside a loop, where the output of one iteration is used in the next one.

However, even very simple GPU broadcasts result in a memory leak and eventually I get OutOfMemoryError().

I am using oneAPI v.1.2.2 on WSL2 with Ubuntu on a Windows 10.

A very simple example that reproduces this is:

using oneAPI

gpu_array = oneAPI.zeros(Float32, 10_000_000)

for j in 1:5_000
    gpu_array .+= 1
end

Is there something I am missing here?

I can see the GPU memory fill up in the Task Manager:

The text was updated successfully, but these errors were encountered:

maleadt · 2023-06-05T08:59:05Z

Works for me, I never get an OOM and it doesn't exceed 3GB of memory use. Do note that Julia uses a GC, so it's expected to see memory usage rise quite a bit until it falls again.

Can you post a backtrace of an OOM?

tiemvanderdeure · 2023-06-05T09:15:10Z

When I run this, memory usage falls very little after running the loop, but stays high at around 7GB.

The loop actually completes without an error, but the OOM error arises when doing any other GPU operation afterwards.

So the above example code should have been (sorry!):

using oneAPI

gpu_array = oneAPI.zeros(Float32, 10_000_000)

for j in 1:5_000
    gpu_array .+= 1
end

gpu_array

Where the final line triggers the following error:

ERROR: OutOfMemoryError()
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/libze.jl:16 [inlined]
  [2] zeCommandListAppendMemoryCopy(hCommandList::oneAPI.oneL0.ZeCommandList, dstptr::Ptr{Float32}, srcptr::oneAPI.oneL0.ZePtr{Float32}, size::Int64, hSignalEvent::Ptr{Nothing}, numWaitEvents::Int64, phWaitEvents::Vector{Any})
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/Ogykr/lib/utils/call.jl:24
  [3] append_copy! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/copy.jl:7 [inlined]
  [4] #76
    @ ~/.julia/packages/oneAPI/Ogykr/src/memory.jl:8 [inlined]
  [5] oneAPI.oneL0.ZeCommandList(::oneAPI.var"#76#77"{Ptr{Float32}, oneAPI.oneL0.ZePtr{Float32}, Int64}, ::oneAPI.oneL0.ZeContext, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:45
  [6] ZeCommandList
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:43 [inlined]
  [7] #execute!#666
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:62 [inlined]
  [8] execute! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:61 [inlined]
  [9] unsafe_copyto!(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, dst::Ptr{Float32}, src::oneAPI.oneL0.ZePtr{Float32}, N::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/Ogykr/src/memory.jl:7
 [10] unsafe_copyto!(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, dest::Vector{Float32}, doffs::Int64, src::oneVector{Float32, oneAPI.oneL0.DeviceBuffer}, soffs::Int64, n::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/Ogykr/src/array.jl:315
 [11] copyto!
    @ ~/.julia/packages/oneAPI/Ogykr/src/array.jl:281 [inlined]
 [12] copyto!
    @ ~/.julia/packages/oneAPI/Ogykr/src/array.jl:285 [inlined]
 [13] copyto_axcheck!(dest::Vector{Float32}, src::oneVector{Float32, oneAPI.oneL0.DeviceBuffer})
    @ Base ./abstractarray.jl:1127
 [14] Array
    @ ./array.jl:626 [inlined]
 [15] Array
    @ ./boot.jl:483 [inlined]
 [16] convert
    @ ./array.jl:617 [inlined]
 [17] adapt_storage
    @ ~/.julia/packages/GPUArrays/TnEpb/src/host/abstractarray.jl:23 [inlined]
 [18] adapt_structure
    @ ~/.julia/packages/Adapt/UtItS/src/Adapt.jl:57 [inlined]
 [19] adapt
    @ ~/.julia/packages/Adapt/UtItS/src/Adapt.jl:40 [inlined]
 [20] print_array
    @ ~/.julia/packages/GPUArrays/TnEpb/src/host/abstractarray.jl:26 [inlined]
 [21] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, X::oneVector{Float32, oneAPI.oneL0.DeviceBuffer})
    @ Base ./arrayshow.jl:399
 [22] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
 [23] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [24] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
 [25] display(d::REPL.REPLDisplay, x::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
 [26] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [27] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [28] invokelatest
    @ ./essentials.jl:726 [inlined]
 [29] (::VSCodeServer.var"#66#70"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:199
 [30] withpath(f::VSCodeServer.var"#66#70"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String)
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/repl.jl:249
 [31] (::VSCodeServer.var"#65#69"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:155
 [32] hideprompt(f::VSCodeServer.var"#65#69"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/repl.jl:38
 [33] (::VSCodeServer.var"#64#68"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:126
 [34] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:511
 [35] with_logger
    @ ./logging.jl:623 [inlined]
 [36] (::VSCodeServer.var"#63#67"{VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:225
 [37] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [38] invokelatest(::Any)
    @ Base ./essentials.jl:726
 [39] macro expansion
    @ ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:34 [inlined]
 [40] (::VSCodeServer.var"#61#62")()
    @ VSCodeServer ./task.jl:484

maleadt · 2023-06-05T12:28:09Z

That still doesn't error here. What happens if you call GC.gc(true) a couple of times after the loop but before the subsequent operation that throws an OOM?

tiemvanderdeure · 2023-06-07T07:48:12Z

I tried running GC.gc(true) once my memory is all filled up and that doesn't seem to do anything. It doesn't change memory use and neither does it solve the error, even if I call it multiple times and give it a few seconds every time.

Running GC.gc(true) in between loops seems to hold memory use way down, but it's also extremely slow. A middle solution where the garbage collector is called every so many loops works, though:

for i in 1:5
    for j in 1:1_000
        gpu_array .+= 1
    end

    GC.gc(true)
end

This keeps memery use at reasonable levels and prevents any OOM errors later. Memory use also doesn't depend on the number of outer loops anymore.

After some experimenting I found out calling sleep(10) instead of GC.gc(true) sometimes, but not always, has the same effect. I don't know if that can give any clues as to what is causing this issue.

maleadt · 2023-06-13T10:35:04Z

That's all very weird. I can't reproduce this no matter how long I tried, or how many kernels I compiled and launched. Maybe this is related to Windows? I don't currently have WSL set-up, so won't be able to try this straight away.

If you have the time, you could consider running the code for a while and writing a heap snapshot, using the new 1.9 functionality: JuliaLang/julia#46862. You can then open the snapshot in Chrome. If it's a Julia object memory leak, we should be able to spot it there. If we're leaking GPU memory however, we'll have to annotate our alloc/free functions (which should be simple enough to add some accounting to) here in oneAPI.jl. But the fact that I can't reproduce makes me think that it's the driver that somehow keeps memory alive, which is going to be much harder to pinpoint.

tiemvanderdeure · 2023-06-15T15:01:39Z

All right, I just ran both again (killing the REPL in between) and took a heap snapshots of both. I looked at them quickly and don't really see an obvious difference, but I don't really know what I am supposed to be looking for.

If this helps at all: I am on Windows 10, using the standard installation of WSL2 (Ubuntu). I use VS Code, but running it directly form Ubuntu doesn't change anything.

The files are too big to upload here, so I uploaded them here instead: https://file.io/xgRlcuitBSPP

maleadt added bug Something isn't working libraries Things about libraries and how we use them. performance Gotta go fast. labels Feb 27, 2024

sylvaticus mentioned this issue Jun 28, 2024

SYCL batching causes invalid results #445

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when running GPU broadcast in a loop #327

Memory leak when running GPU broadcast in a loop #327

tiemvanderdeure commented Jun 1, 2023

maleadt commented Jun 5, 2023

tiemvanderdeure commented Jun 5, 2023 •

edited

Loading

maleadt commented Jun 5, 2023

tiemvanderdeure commented Jun 7, 2023

maleadt commented Jun 13, 2023 •

edited

Loading

tiemvanderdeure commented Jun 15, 2023

Memory leak when running GPU broadcast in a loop #327

Memory leak when running GPU broadcast in a loop #327

Comments

tiemvanderdeure commented Jun 1, 2023

maleadt commented Jun 5, 2023

tiemvanderdeure commented Jun 5, 2023 • edited Loading

maleadt commented Jun 5, 2023

tiemvanderdeure commented Jun 7, 2023

maleadt commented Jun 13, 2023 • edited Loading

tiemvanderdeure commented Jun 15, 2023

tiemvanderdeure commented Jun 5, 2023 •

edited

Loading

maleadt commented Jun 13, 2023 •

edited

Loading