Memory leak... somewhere #794

defeo · 2020-02-20T23:46:04Z

Hi,

I'm pretty convinced I hit a memory leak in Nemo. I have a mildly complicated code doing computations in GF(p). When I try to run the code in a loop (for benchmarking), the memory usage of Julia slowly gets up, and before one minute Julia gets killed by the system. If I change GaloisField to FiniteField I get the same result, which makes me think the leak must be somewhere else (around fmpz, maybe?).

I ran a valgrind --leak-check=full --smc-check=all-non-file, but I'm not sure the output contains much useful information. I'm attaching the full report. Is there anything I can do to get a more useful output from valgrind?

The text was updated successfully, but these errors were encountered:

thofma · 2020-02-21T06:43:41Z

Due to the nature of the julia garbage collector, it is a bit difficult to tell if there really is a memory leak just by looking at the memory usage.

Can you try to splice in some GC.gc() to force garbage collection and see if the memory usage is stable?

Also, do you have lots of different p?

defeo · 2020-02-21T15:19:12Z

Due to the nature of the julia garbage collector, it is a bit difficult to tell if there really is a memory leak just by looking at the memory usage.

Can you try to splice in some GC.gc() to force garbage collection and see if the memory usage is stable?

Hmm. This is strange. If I call my function ~100 times in a tight loop,

for i in 1:100
    my_function()
end

it fills the memory and the process gets killed. But if I add GC.gc() in the loop, the memory stays more or less constant.

Why wouldn't Julia be able to reclaim the memory in the tight loop? The function takes as inputs a few elements of a finite field, does a lot of computations, and spits out a few elements of the same finite field.

I've already run intensive benchmarks on the subroutines of my function, without ever filling the memory.

Also, do you have lots of different p?

No

wbhart · 2020-02-21T15:28:13Z

It should be able to claim it. But garbage collectors are not perfect. They can hit corner cases.

The main issues seem to be that the Julia collector is very aggressive in grabbing memory and not letting it go, and in not adjusting to changing conditions on the machine, such as running out of memory, or other processes starting up and using the memory.

In very tight loops, there just might not be enough memory allocation calls for Julia to do enough gc steps and so memory just gets filled up.

Hopefully that will all improve with time, but every now and again, when there is a gc issue like this, manually inserting gc() calls seems to be the only solution.

If the example is simple enough, we could pass it to the Julia people to help them with gc tuning. But we'd first need to check that we are actually counting all the memory used.

thofma · 2020-02-21T15:28:17Z

It's not that julia is not able to reclaim the memory, but it seems that the garbage collector thinks that there is no need to reclaim it at this point.

At the moment, there is also no way to limit the memory used by julia, see https://discourse.julialang.org/t/limit-julia-memory/34409 and JuliaLang/julia#17987.

wbhart · 2020-02-21T15:33:17Z

If your code uses multivariate division or GCD then flintlib/flint#619 could explain it, for example.

wbhart · 2020-02-21T15:35:05Z

We should also valgrind the relevant Flint tests.

Are all the computations using Flint's fq?

defeo · 2020-02-21T15:51:27Z

If your code uses multivariate division or GCD then flintlib/flint#619 could explain it, for example.

No multivariates. It only uses prime fields and polynomials over them

Are all the computations using Flint's fq?

I have the same problem with GaloisField and with FiniteField.

If the example is simple enough, we could pass it to the Julia people to help them with gc tuning. But we'd first need to check that we are actually counting all the memory used.

It is definitely not that simple. It's a full implementation of CSIDH. Individual benchmarks on the subroutines (EC arithmetic, isogeny computations) do not fill the memory. Only the full algorithm does.

I can share the code in a few days, but ATM we're rushing to finish in time for a deadline on Tuesday.

thofma · 2020-02-21T16:02:51Z

Given the time constraints, I think your best bet is to splice in some GC.gc() now and then.

wbhart · 2020-02-21T16:51:10Z

Unfortunately it is similar to all the other examples of this that we have: someone is working on some complex CSIDH and the system runs out of memory.

We were hoping this might be a simple example. Sorry to hear it isn't.

Anyhow, on the Flint side, there shouldn't be any reason for it. The finite field stuff has been valgrinded quite a bit. It's probably just the general Julia GC problems.

wbhart · 2020-02-21T17:04:14Z

I did remove some memory leaks from the threaded polynomial factorisation, but I don't think you could be using it as we don't build with openmp support and I don't think we've had a release since we switched over to pthreads only.

Those same leaks don't exist in the standard code as they were just cases of threads exiting in the middle of a function and all the cleanup being at the end of the function.

thofma · 2020-03-01T11:59:25Z

Did it work out @defeo?

defeo · 2020-03-01T12:59:25Z

Not really. Sprinkling GC.gc() around helped a bit, but the only reliable solution was to benchmark on a 64 GB RAM machine, or write C code. I did both.

thofma · 2020-03-01T13:06:40Z

OK. Would be interesting to see the example to reproduce.

defeo · 2020-03-01T13:12:56Z

I will clean up the code and provide them.

defeo · 2020-03-26T15:01:37Z

Hi. This took a long time, but we finally put the code online: https://velusqrt.isogeny.org/software.html.

We managed to stabilize the code, and in particular I'm not able to reproduce the behavior described in #794 (comment) (and I have no idea why, but in the meantime I had several OS updates and now I'm at Julia 1.4.0).

For some reason, on my 7.5GiB RAM, 2GiB Swap laptop, Julia decides that it's wise to eventually occupy up to 3.2GiB of resident memory (and no swap), and never release it, even when I call GC.gc(). This is ok-ish for running code, and does not crash the Julia process. It is a nightmare for benchmarking, because timings are extremely volatile, even with systematic gc() calls... oh well!

However, Firefox begs to differ: when Julia reaches ~3GiB, it becomes extremely unstable, and easily freezes my system when I then try to switch a few tabs or do a search. The issue looks vastly more complex than a Julia/Nemo one, I'm going to close this issue.

wbhart · 2020-03-26T19:08:11Z

Ok, thanks for the followup. Let's hope this just improves with time.

defeo closed this as completed Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak... somewhere #794

Memory leak... somewhere #794

defeo commented Feb 20, 2020

thofma commented Feb 21, 2020

defeo commented Feb 21, 2020

wbhart commented Feb 21, 2020

thofma commented Feb 21, 2020

wbhart commented Feb 21, 2020

wbhart commented Feb 21, 2020

defeo commented Feb 21, 2020

thofma commented Feb 21, 2020 •

edited

Loading

wbhart commented Feb 21, 2020

wbhart commented Feb 21, 2020

thofma commented Mar 1, 2020

defeo commented Mar 1, 2020

thofma commented Mar 1, 2020

defeo commented Mar 1, 2020 •

edited

Loading

defeo commented Mar 26, 2020

wbhart commented Mar 26, 2020

Memory leak... somewhere #794

Memory leak... somewhere #794

Comments

defeo commented Feb 20, 2020

thofma commented Feb 21, 2020

defeo commented Feb 21, 2020

wbhart commented Feb 21, 2020

thofma commented Feb 21, 2020

wbhart commented Feb 21, 2020

wbhart commented Feb 21, 2020

defeo commented Feb 21, 2020

thofma commented Feb 21, 2020 • edited Loading

wbhart commented Feb 21, 2020

wbhart commented Feb 21, 2020

thofma commented Mar 1, 2020

defeo commented Mar 1, 2020

thofma commented Mar 1, 2020

defeo commented Mar 1, 2020 • edited Loading

defeo commented Mar 26, 2020

wbhart commented Mar 26, 2020

thofma commented Feb 21, 2020 •

edited

Loading

defeo commented Mar 1, 2020 •

edited

Loading