Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) #56759

MilesCranmer · 2024-12-05T09:34:40Z

We're seeing memory leaks in PySR/SymbolicRegression.jl that appear related to Julia 1.11's parallel GC. The user (@GoldenGoldy) tried various solutions including heap size hints and other parameter adjustments, but memory usage would steadily climb until OOM crashes occurred after 8-11 hours. The issue vanishes completely when switching to Julia 1.10 - no other changes needed. While we don't yet have a minimal working example in pure Julia, I wanted to raise this as it's causing OOM crashes in production workloads.

Full reproduction steps and details in: MilesCranmer/PySR#764, including detailed diagnostics on the memory usage `

vchuravy · 2024-12-05T10:04:59Z

Does this reproduce without Python in the loop?

MilesCranmer · 2024-12-05T13:32:02Z

Just checked and, yes, it seems to continually to increase in memory, albeit slowly. It's slow enough that the OOM error only happens after 11 hours of production runtime.

using SymbolicRegression

X = randn(Float32, 5, 10_000)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
)

hall_of_fame = equation_search(X, y; niterations=1000000000, options=options, parallelism=:multiprocessing)

To clarify, Python is only used to call a Julia method that runs for 12 hours without passing control back to Python. I don't expect Python to factor into things. The new Julia processes are spawned from Julia and do not interact with Python.

Here are some plots @GoldenGoldy made, with memory usage per Julia process over the course of the 12-hour run:

Julia 1.11:

and Julia 1.10:

Note that I automatically set a conservative --heap-size-hint for spawned Julia processes which is why these hover at around 0.6 GiB.

gbaraldi · 2024-12-05T13:38:47Z

If possible can you take heap snapshots at the start and after a while to see what is being allocated?

MilesCranmer · 2024-12-05T13:41:56Z

What's the best way to do that on macOS?

(I think @GoldenGoldy is on Linux though FYI. I'm not sure if the memory leak is OS dependent or not)

gbaraldi · 2024-12-05T13:49:43Z

Profile.take_heap_snapshot it creates a file on pwd if I recall correctly. Then you can open it using the chrome dev tools

MilesCranmer · 2024-12-05T14:44:49Z

Ok. I made some heap snapshots. Do you want the full files? (250 MB each). Not sure what I am supposed to look for though.

gbaraldi · 2024-12-05T14:49:22Z

Are the start one and the end one the same size? 🤔 But yeah, could you zip them and send them to me?

MilesCranmer · 2024-12-05T14:57:44Z

Maybe it's hard to see much a difference in ~5 minutes (it might even go down - see plot above), it's only increased slightly over that time.

Also, does take_heap_snapshot record other Julia processes too, or just the main process?

One more question. Would the heap snapshot even record the memory leak? Or do you just want to see if it's a real leak or actual allocations somewhere?

gbaraldi · 2024-12-05T15:06:50Z

The heap snapshot will see everything that is live. I want to see if it's an actual leak i.e a pointer we forgot to call free on or extra live objects around for some reason. take_heap_snapshot records only the current process.

MilesCranmer · 2024-12-05T15:09:30Z

So the heap snapshots are about the same size, even over a 10 minute interval. And I am looking at the memory breakdown and they seem quite similar in where memory is allocated too. But despite this, the process's allocated memory continues to increase.

So does this mean it's a real leak?

d-netto · 2024-12-05T15:17:42Z

Might be useful to check whether the memory increase/possible leak is stemming from an increase in pool allocated pages or an increase in mallocd memory.

I.e. #55794 (comment).

Also, does this only reproduce when using multiple GC threads or also reproduces with a single GC thread?

MilesCranmer · 2024-12-05T15:54:49Z

Over a 30 minute interval, the heap snapshots are still pretty much the same. But the memory usage of the actual Julia process is greater.

@d-netto would it track that? If it doesn't show up in the live objects then I would have thought that means the GC is unaware of it existing?

It should be easy to reproduce locally with the code above if you want to poke at certain things.

MilesCranmer · 2024-12-05T16:20:41Z

Copying @GoldenGoldy's comment from here. Basically the use of multiprocessing seems to not relevant for the leak; it seems it happens regardless of processing mode:

I wanted to do multiple additional tests but lacked the time. However, I did manage to do one test with parallelism="multithreading". Again no issues on Julia 1.10, while memory usage continues to grow on Julia 1.11.

In both cases I forced the Julia version, using:

import juliapkg
juliapkg.require_julia("~1.10")
from pysr import PySRRegressor

or

import juliapkg
juliapkg.require_julia("~1.11")
from pysr import PySRRegressor

Julia 1.10

Julia 1.11

(They are using a VM with 240 GB of memory which is why it climbs so high before an OOM)

Note the two other colors are kernel and disk data, which remain flat in both cases.

gbaraldi · 2024-12-05T16:24:50Z

Do you see a leak on macos as well? Because on my laptop It's running for around 30 minutes and it's still around 1.2GB memory usage

MilesCranmer · 2024-12-05T16:35:47Z

I'm on macOS, yes. Note that if you run the original script above it launches additional Julia processes - those ones have the blow-up in memory while the head worker is fairly flat. But you could run the following to just have multithreading instead:

using SymbolicRegression

X = randn(Float32, 5, 10_000)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
)

hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

The memory fluctuates a lot for me, but it still does trend higher. Is yours constant at 1.2 GiB?

However I wouldn't expect something identical to @GoldenGoldy because they use slightly different settings. They couldn't share the full script - presumably due to it being company code - but I can definitely reproduce a memory leak on my machine with this code.

gbaraldi · 2024-12-05T16:36:54Z

Yeah it just seems constant to me.

In fact 1.10 seems to use even more memory. Just running the script you sent a little while ago

MilesCranmer · 2024-12-05T16:41:53Z

It looks to be using multiprocessing in that example; can you try the multithreading instead? i.e.,:

using SymbolicRegression
X = randn(Float32, 5, 10_000)
y = randn(Float32, 10_000)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

Are you using top? You could try btop which lets you monitor it over time and I find is a bit more accurate

gbaraldi · 2024-12-05T17:00:11Z

Im using both htop and Instruments (the Xcode tool). Will give the multithreaded version a go. Ok I do see a slight increase over time. Slower than what that server is showing but it is there.

gbaraldi · 2024-12-05T17:50:31Z

Can you run it with GC.enable_logging because on my machine it looks like we are increasing the amount of memory we have mapped. It might be due to fragmentation or something else. And did you send the snapshots anywhere? It could be that we are missing something there and I would like to check

MilesCranmer · 2024-12-05T18:05:08Z

I think their server run just accentuated the problem due to larger compute power - i.e., they can generate garbage quicker.

we are increasing the amount of memory we have mapped. It might be due to fragmentation or something else

I don't have the background to know what this means so forgive my naiveté – do you mean this is a memory leak? Or it is something else?

gbaraldi · 2024-12-05T18:11:08Z

I'm not sure yet. It looks to me julia thinks there are more and more objects alive. At least it maps more and more memory.

MilesCranmer · 2024-12-05T20:37:54Z

(Just sent the heap snapshots on slack by the way)

MilesCranmer · 2024-12-06T22:51:45Z

Ping on this. Is there anything I can help with or look at? This is a major bug for downstream users so I want to fix it ASAP if at all possible

MilesCranmer · 2024-12-07T00:08:07Z

One more experiment to the roster: I confirmed that --gcthreads=1 does not fix the problem, so this does not seem related to #56735

GoldenGoldy · 2024-12-07T13:27:27Z

Just want to add that using:

using SymbolicRegression
X = randn(Float32, 2, 2500)
y = randn(Float32, 2500)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

the memory usage increases much faster than with:

using SymbolicRegression
X = randn(Float32, 5, 10_000)
y = randn(Float32, 10_000)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options

See MilesCranmer/PySR#764 (comment) for more details and graphs.

d-netto · 2024-12-07T16:45:20Z

Ran this for 5min on my M2:

using SymbolicRegression
X = randn(Float32, 2, 2500)
y = randn(Float32, 2500)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
GC.enable_logging(true)
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

Could reproduce the fast memory increase.

I noticed that the vast majority of the memory reported by GC.enable_logging is coming from mallocd memory, not pool allocated memory -- e.g. here is one line from GC.enable_logging that I got after running this benchmark for a few minutes, notice how bytes_mapped (pools) is around than 10% of the heap size:

Heap stats: bytes_mapped 2048.00 MB, bytes_resident 2048.00 MB, ..., heap_size 23823.30 MB, heap_target 25014.47 MB

FWIW, another user opened https://discourse.julialang.org/t/dont-understand-why-code-runs-out-of-memory-and-crashes/123559/22. After running their reproducer with GC.enable_logging I observed the same thing: heap size increases, but most of it is mallocd memory, which doesn't shrink even if you run a GC.

Didn't investigate further to know whether it's related, but seems suspicious...

MilesCranmer · 2024-12-07T20:14:57Z

Thanks @d-netto. Via that thread I also found ccall(:malloc_trim, Int32, (Int32,), 0) on https://discourse.julialang.org/t/memory-management/102567/3. Do you know if there is an equivalent of malloc_trim for macOS so I can test this?

d-netto · 2024-12-07T20:19:24Z

I don't know.

But malloc_trim would probably not be useful here.

malloc_trim is useful when libc is holding into pages of memory that have been freed, and not giving them back to the OS.

What seems to be happening here is that a bunch of mallocd memory is being considered alive by the GC when it shouldn't, so free doesn't even get a chance to run.

vchuravy · 2024-12-10T05:43:17Z

Does a heap-snapshot show the memory? If so, it should tell us why we think it is being rooted.

felixcremer · 2024-12-10T15:09:10Z

I think I might also be running into this problem as described in EarthyScience/RQADeforestation.jl#36.

I am currently producing heap snapshots with julia 1.11. Is there something specific that I should be looking for in them?

IanButterworth · 2024-12-10T15:47:23Z

@vchuravy I don't know what to look for but with devtools you can also diff two heapsnapshots

gbaraldi · 2024-12-10T16:07:07Z

A bisection using

using StatsBase
function Simulate()
    Simulations=Int(1e7)
    Size=1000
    result = Array{Float64}(undef, Simulations, 1)
    Threads.@threads for i = 1:Simulations
         x = randn(Size)
         s = sort(x)
        result[i, 1] = s[1]
    end
    println(median(result))
end
for i in 1:1000
    println(i)
    Simulate()
    GC.gc(true)
    # Print live_bytes
    println("live_bytes in MB: ", Base.gc_live_bytes() / 1024^2)
    sleep(10) # sleep for 10 seconds
end

shows the memory PR 909bcea

MilesCranmer · 2024-12-10T16:51:33Z

Interesting. Is the multithreading is required to reproduce? If so, does it mean there’s a race condition in the GC?

gbaraldi · 2024-12-10T16:55:58Z

Multithreading is not needed

MilesCranmer · 2024-12-10T17:06:02Z

Is it any object, or only Memory-based allocations?

oscardssmith · 2024-12-10T18:25:49Z

So the issue Gabriel found is #55223, but that doesn't seem to be the whole problem.

MilesCranmer · 2024-12-13T01:37:32Z

Confirmed #56801 fixes this.

MilesCranmer mentioned this issue Dec 5, 2024

[BUG]: Memory issue in version 1.0.0? MilesCranmer/PySR#764

Closed

d-netto self-assigned this Dec 5, 2024

nsajko added the GC Garbage collector label Dec 5, 2024

IanButterworth added the regression 1.11 Regression in the 1.11 release label Dec 5, 2024

felixcremer mentioned this issue Dec 6, 2024

Running rqatrend seems to accumulate memory EarthyScience/RQADeforestation.jl#36

Open

gbaraldi mentioned this issue Dec 11, 2024

gc: improve mallocarrays locality #56801

Merged

vtjnash closed this as completed Dec 13, 2024

This was referenced Dec 14, 2024

[BUG]: Memory leak(?) when using batching with large dataset (>450K items) MilesCranmer/PySR#706

Closed

Garbage collection thread safety issues on 1.11 #56871

Closed

MilesCranmer mentioned this issue Jan 2, 2025

fix: patch changed behavior of setproperty! for modules JuliaPy/PythonCall.jl#583

Open

Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) #56759

Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) #56759

Comments

MilesCranmer commented Dec 5, 2024 • edited Loading

vchuravy commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024 • edited Loading

gbaraldi commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024 • edited Loading

gbaraldi commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024 • edited Loading

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

d-netto commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024 • edited Loading

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024 • edited Loading

gbaraldi commented Dec 5, 2024 • edited Loading

MilesCranmer commented Dec 5, 2024

gbaraldi commented Dec 5, 2024

MilesCranmer commented Dec 5, 2024

MilesCranmer commented Dec 6, 2024

MilesCranmer commented Dec 7, 2024

GoldenGoldy commented Dec 7, 2024

d-netto commented Dec 7, 2024

MilesCranmer commented Dec 7, 2024

d-netto commented Dec 7, 2024

vchuravy commented Dec 10, 2024

felixcremer commented Dec 10, 2024

IanButterworth commented Dec 10, 2024 • edited Loading

gbaraldi commented Dec 10, 2024

MilesCranmer commented Dec 10, 2024

gbaraldi commented Dec 10, 2024

MilesCranmer commented Dec 10, 2024

oscardssmith commented Dec 10, 2024

MilesCranmer commented Dec 13, 2024

MilesCranmer commented Dec 5, 2024 •

edited

Loading

MilesCranmer commented Dec 5, 2024 •

edited

Loading

MilesCranmer commented Dec 5, 2024 •

edited

Loading

gbaraldi commented Dec 5, 2024 •

edited

Loading

gbaraldi commented Dec 5, 2024 •

edited

Loading

gbaraldi commented Dec 5, 2024 •

edited

Loading

gbaraldi commented Dec 5, 2024 •

edited

Loading

IanButterworth commented Dec 10, 2024 •

edited

Loading