Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) #56759

Closed
MilesCranmer opened this issue Dec 5, 2024 · 37 comments
Closed
Assignees
Labels
GC Garbage collector regression 1.11 Regression in the 1.11 release

Comments

@MilesCranmer
Copy link
Member

MilesCranmer commented Dec 5, 2024

We're seeing memory leaks in PySR/SymbolicRegression.jl that appear related to Julia 1.11's parallel GC. The user (@GoldenGoldy) tried various solutions including heap size hints and other parameter adjustments, but memory usage would steadily climb until OOM crashes occurred after 8-11 hours. The issue vanishes completely when switching to Julia 1.10 - no other changes needed. While we don't yet have a minimal working example in pure Julia, I wanted to raise this as it's causing OOM crashes in production workloads.

Full reproduction steps and details in: MilesCranmer/PySR#764, including detailed diagnostics on the memory usage `​​​​

@vchuravy
Copy link
Member

vchuravy commented Dec 5, 2024

Does this reproduce without Python in the loop?

@nsajko nsajko added the GC Garbage collector label Dec 5, 2024
@MilesCranmer
Copy link
Member Author

MilesCranmer commented Dec 5, 2024

Just checked and, yes, it seems to continually to increase in memory, albeit slowly. It's slow enough that the OOM error only happens after 11 hours of production runtime.

using SymbolicRegression

X = randn(Float32, 5, 10_000)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
)

hall_of_fame = equation_search(X, y; niterations=1000000000, options=options, parallelism=:multiprocessing)

To clarify, Python is only used to call a Julia method that runs for 12 hours without passing control back to Python. I don't expect Python to factor into things. The new Julia processes are spawned from Julia and do not interact with Python.

Here are some plots @GoldenGoldy made, with memory usage per Julia process over the course of the 12-hour run:

Julia 1.11:
Image

and Julia 1.10:
Image

Note that I automatically set a conservative --heap-size-hint for spawned Julia processes which is why these hover at around 0.6 GiB.

@IanButterworth IanButterworth added the regression 1.11 Regression in the 1.11 release label Dec 5, 2024
@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

If possible can you take heap snapshots at the start and after a while to see what is being allocated?

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Dec 5, 2024

What's the best way to do that on macOS?

(I think @GoldenGoldy is on Linux though FYI. I'm not sure if the memory leak is OS dependent or not)

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Profile.take_heap_snapshot it creates a file on pwd if I recall correctly. Then you can open it using the chrome dev tools

@MilesCranmer
Copy link
Member Author

Ok. I made some heap snapshots. Do you want the full files? (250 MB each). Not sure what I am supposed to look for though.

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Are the start one and the end one the same size? 🤔 But yeah, could you zip them and send them to me?

@MilesCranmer
Copy link
Member Author

Maybe it's hard to see much a difference in ~5 minutes (it might even go down - see plot above), it's only increased slightly over that time.

Also, does take_heap_snapshot record other Julia processes too, or just the main process?

One more question. Would the heap snapshot even record the memory leak? Or do you just want to see if it's a real leak or actual allocations somewhere?

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

The heap snapshot will see everything that is live. I want to see if it's an actual leak i.e a pointer we forgot to call free on or extra live objects around for some reason. take_heap_snapshot records only the current process.

@MilesCranmer
Copy link
Member Author

So the heap snapshots are about the same size, even over a 10 minute interval. And I am looking at the memory breakdown and they seem quite similar in where memory is allocated too. But despite this, the process's allocated memory continues to increase.

So does this mean it's a real leak?

@d-netto
Copy link
Member

d-netto commented Dec 5, 2024

Might be useful to check whether the memory increase/possible leak is stemming from an increase in pool allocated pages or an increase in mallocd memory.

I.e. #55794 (comment).

Also, does this only reproduce when using multiple GC threads or also reproduces with a single GC thread?

@MilesCranmer
Copy link
Member Author

Over a 30 minute interval, the heap snapshots are still pretty much the same. But the memory usage of the actual Julia process is greater.

@d-netto would it track that? If it doesn't show up in the live objects then I would have thought that means the GC is unaware of it existing?

It should be easy to reproduce locally with the code above if you want to poke at certain things.

@MilesCranmer
Copy link
Member Author

Copying @GoldenGoldy's comment from here. Basically the use of multiprocessing seems to not relevant for the leak; it seems it happens regardless of processing mode:

I wanted to do multiple additional tests but lacked the time. However, I did manage to do one test with parallelism="multithreading". Again no issues on Julia 1.10, while memory usage continues to grow on Julia 1.11.

In both cases I forced the Julia version, using:

import juliapkg
juliapkg.require_julia("~1.10")
from pysr import PySRRegressor

or

import juliapkg
juliapkg.require_julia("~1.11")
from pysr import PySRRegressor

Julia 1.10 PySR_memory_usage_force_Julia1_10_multithreading

Julia 1.11 PySR_memory_usage_force_Julia1_11_multithreading

(They are using a VM with 240 GB of memory which is why it climbs so high before an OOM)

Note the two other colors are kernel and disk data, which remain flat in both cases.

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Do you see a leak on macos as well? Because on my laptop It's running for around 30 minutes and it's still around 1.2GB memory usage

@MilesCranmer
Copy link
Member Author

I'm on macOS, yes. Note that if you run the original script above it launches additional Julia processes - those ones have the blow-up in memory while the head worker is fairly flat. But you could run the following to just have multithreading instead:

using SymbolicRegression

X = randn(Float32, 5, 10_000)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
)

hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

The memory fluctuates a lot for me, but it still does trend higher. Is yours constant at 1.2 GiB?

However I wouldn't expect something identical to @GoldenGoldy because they use slightly different settings. They couldn't share the full script - presumably due to it being company code - but I can definitely reproduce a memory leak on my machine with this code.

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Yeah it just seems constant to me.
Image

In fact 1.10 seems to use even more memory. Just running the script you sent a little while ago

@MilesCranmer
Copy link
Member Author

It looks to be using multiprocessing in that example; can you try the multithreading instead? i.e.,:

using SymbolicRegression
X = randn(Float32, 5, 10_000)
y = randn(Float32, 10_000)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

Are you using top? You could try btop which lets you monitor it over time and I find is a bit more accurate

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Im using both htop and Instruments (the Xcode tool). Will give the multithreaded version a go. Ok I do see a slight increase over time. Slower than what that server is showing but it is there.

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

Can you run it with GC.enable_logging because on my machine it looks like we are increasing the amount of memory we have mapped. It might be due to fragmentation or something else. And did you send the snapshots anywhere? It could be that we are missing something there and I would like to check

@MilesCranmer
Copy link
Member Author

I think their server run just accentuated the problem due to larger compute power - i.e., they can generate garbage quicker.

we are increasing the amount of memory we have mapped. It might be due to fragmentation or something else

I don't have the background to know what this means so forgive my naiveté – do you mean this is a memory leak? Or it is something else?

@gbaraldi
Copy link
Member

gbaraldi commented Dec 5, 2024

I'm not sure yet. It looks to me julia thinks there are more and more objects alive. At least it maps more and more memory.

@MilesCranmer
Copy link
Member Author

(Just sent the heap snapshots on slack by the way)

@MilesCranmer
Copy link
Member Author

Ping on this. Is there anything I can help with or look at? This is a major bug for downstream users so I want to fix it ASAP if at all possible

@MilesCranmer
Copy link
Member Author

One more experiment to the roster: I confirmed that --gcthreads=1 does not fix the problem, so this does not seem related to #56735

@GoldenGoldy
Copy link

Just want to add that using:

using SymbolicRegression
X = randn(Float32, 2, 2500)
y = randn(Float32, 2500)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

the memory usage increases much faster than with:

using SymbolicRegression
X = randn(Float32, 5, 10_000)
y = randn(Float32, 10_000)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options

See MilesCranmer/PySR#764 (comment) for more details and graphs.

@d-netto
Copy link
Member

d-netto commented Dec 7, 2024

Ran this for 5min on my M2:

using SymbolicRegression
X = randn(Float32, 2, 2500)
y = randn(Float32, 2500)
options = SymbolicRegression.Options(binary_operators=[+, *, /, -], unary_operators=[cos, exp])
GC.enable_logging(true)
hall_of_fame = equation_search(X, y; niterations=1000000000, options=options)

Could reproduce the fast memory increase.

I noticed that the vast majority of the memory reported by GC.enable_logging is coming from mallocd memory, not pool allocated memory -- e.g. here is one line from GC.enable_logging that I got after running this benchmark for a few minutes, notice how bytes_mapped (pools) is around than 10% of the heap size:

Heap stats: bytes_mapped 2048.00 MB, bytes_resident 2048.00 MB, ..., heap_size 23823.30 MB, heap_target 25014.47 MB

FWIW, another user opened https://discourse.julialang.org/t/dont-understand-why-code-runs-out-of-memory-and-crashes/123559/22. After running their reproducer with GC.enable_logging I observed the same thing: heap size increases, but most of it is mallocd memory, which doesn't shrink even if you run a GC.

Didn't investigate further to know whether it's related, but seems suspicious...

@MilesCranmer
Copy link
Member Author

Thanks @d-netto. Via that thread I also found ccall(:malloc_trim, Int32, (Int32,), 0) on https://discourse.julialang.org/t/memory-management/102567/3. Do you know if there is an equivalent of malloc_trim for macOS so I can test this?

@d-netto
Copy link
Member

d-netto commented Dec 7, 2024

I don't know.

But malloc_trim would probably not be useful here.

malloc_trim is useful when libc is holding into pages of memory that have been freed, and not giving them back to the OS.

What seems to be happening here is that a bunch of mallocd memory is being considered alive by the GC when it shouldn't, so free doesn't even get a chance to run.

@vchuravy
Copy link
Member

Does a heap-snapshot show the memory? If so, it should tell us why we think it is being rooted.

@felixcremer
Copy link
Contributor

I think I might also be running into this problem as described in EarthyScience/RQADeforestation.jl#36.

I am currently producing heap snapshots with julia 1.11. Is there something specific that I should be looking for in them?

@IanButterworth
Copy link
Member

IanButterworth commented Dec 10, 2024

@vchuravy I don't know what to look for but with devtools you can also diff two heapsnapshots

Image

@gbaraldi
Copy link
Member

A bisection using

using StatsBase
function Simulate()
    Simulations=Int(1e7)
    Size=1000
    result = Array{Float64}(undef, Simulations, 1)
    Threads.@threads for i = 1:Simulations
         x = randn(Size)
         s = sort(x)
        result[i, 1] = s[1]
    end
    println(median(result))
end
for i in 1:1000
    println(i)
    Simulate()
    GC.gc(true)
    # Print live_bytes
    println("live_bytes in MB: ", Base.gc_live_bytes() / 1024^2)
    sleep(10) # sleep for 10 seconds
end

shows the memory PR 909bcea

@MilesCranmer
Copy link
Member Author

Interesting. Is the multithreading is required to reproduce? If so, does it mean there’s a race condition in the GC?

@gbaraldi
Copy link
Member

Multithreading is not needed

@MilesCranmer
Copy link
Member Author

Is it any object, or only Memory-based allocations?

@oscardssmith
Copy link
Member

So the issue Gabriel found is #55223, but that doesn't seem to be the whole problem.

@MilesCranmer
Copy link
Member Author

Confirmed #56801 fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression 1.11 Regression in the 1.11 release
Projects
None yet
Development

No branches or pull requests

10 participants