Running rqatrend seems to accumulate memory #36

felixcremer · 2024-12-01T19:46:51Z

Havent investigated but when running the rqatrend for different years and Orbits it fails after a few iterations with an out of memory error.

felixcremer · 2024-12-02T13:40:11Z

When I am running rqatrend with 10 threads on a single worker after the run time there is 2 GB more used memory reported by htop.

I tried looking into where this extra memory might be, but I haven't found the culprit yet.

felixcremer · 2024-12-02T23:02:04Z

This might be a threading issue. I found this issue on the Julia repo JuliaLang/julia#40626 and I am currently testing with 14 workers but they are scratching on the total RAM and are also a lot slower than the combination of 3 workers and 10 threads each apparently a factor of 4

felixcremer · 2024-12-03T22:01:20Z

After letting it run for a few hours the memory usage accumulates to 11.7 GB per worker and it throws an error indicating that the worker ran out of memory.

I can find the memory when I check the memory usage from the different workers but I can't free it by running GC.gc()
I am still not sure, whether this is the thread issue or a distributed issue or a combination of both.

danlooo · 2024-12-04T15:45:07Z

Have you tried Profile.Allocs.@profile from PProf.jl, e.g. see here?

felixcremer · 2024-12-06T15:38:35Z

Running it with Julia 1.11.2 still sees the same problem maybe with a but slower increase and the @time macro before the rqatrend does not indicate any gc time which might be a bad sign:

Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:36:29
2189.929099 seconds (227.82 k allocations: 13.929 MiB, 0.00% gc time, 28 lock conflicts)
  0.191193 seconds (13.23 k allocations: 1.426 MiB)
path = "/mnt/felix1/worldmap/data/E048N015T3_rqatrend_VH_146_thresh_3.0_year_2018"
outpath = "/mnt/felix1/worldmap/data/E048N015T3_rqatrend_VH_146_thresh_3.0_year_2018.zarr"

felixcremer · 2024-12-06T15:52:53Z

I tested it shortly on julia 1.10 because it might also be a regression. This might be related to this issue:
JuliaLang/julia#56759
I will start the process for a longer time with julia 1.10 to see, whether it also slowly fills up the memory.

One suggestions in the linked issue was to take heap_snapshots via Profile.take_heap_snapshot

MilesCranmer · 2024-12-06T22:53:35Z

@felixcremer did switching to 1.10 help? If you think it's the same issue I was seeing in my package, please share on JuliaLang/julia#56759

felixcremer · 2024-12-09T14:11:30Z

~~I think this is not~~ Might be related to JuliaLang/julia#56759 because also in julia 1.10 I get a similar behaviour and after multiple mapCube computations the memory is filled up ~~and the julia process fails with an out of memory error~~. Even though the memory is accumulating a bit slower than with julia 1.11 and then it does not fail but manages to hover at 95% usage. I am going to rerun it with julia 1.11 as well to see when it is going to run out of memory.

Edit: I spoke to soon and assumed it is shortly going to fail with OOM but it didn't.

felixcremer · 2024-12-10T15:13:23Z

The strong memory usage might also be related to JuliaLang/julia#55794 to conclude that I am planning to run the mapcube with an on memory array so that we don't have to deal with IO at all.

felixcremer · 2024-12-12T15:46:55Z

Poking at this a bit more I realized that we are not freeing the IRasterBand pointers from the BufferGDALBand and that might be the memory that is not available to the GC anymore.
I changed the ArchGDAL finalizer for the IRasterBand to print when it is finalized and this is not shown when setting the gdalcube to nothing and running GC afterwards.

We might be able to register a finalizer function for every cube we open to close the gdalbands after the mapcube usage.

There is also JuliaGeo/GDAL.jl#77 which might be related, but is most likely not the main culprit.

lazarusA · 2024-12-12T15:56:26Z

But that is the whole point of this approach. Otherwise you could simply use the Yaxarraybase approach.

meggart · 2024-12-13T07:52:46Z

I changed the ArchGDAL finalizer for the IRasterBand to print when it is finalized and this is not shown when setting the gdalcube to nothing and running GC afterwards.

I think this does not work in general, printing to terminal from within finalizers does not work. In the past there was usually a warning saying something like "Task switch from finalizers not allowed". So the missing print is not a proof that the finalizer is not run. However, I agree with Lazaro, just testing this with the YAXArrayBase GDALBand would be a low-effort test to check if this is about a defect IRasterBand finalizer

felixcremer · 2024-12-13T09:56:51Z

I am using '@async' for the printing, so that it is done, and actually sometimes this is actually printing in between.

felixcremer · 2024-12-13T11:03:31Z

It seems that this is much better in the current master of julia.
I am not yet sure, whether this is fully fixing it but I am hopeful.

felixcremer mentioned this issue Dec 10, 2024

Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) JuliaLang/julia#56759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running rqatrend seems to accumulate memory #36

Running rqatrend seems to accumulate memory #36

felixcremer commented Dec 1, 2024

felixcremer commented Dec 2, 2024

felixcremer commented Dec 2, 2024

felixcremer commented Dec 3, 2024

danlooo commented Dec 4, 2024

felixcremer commented Dec 6, 2024

felixcremer commented Dec 6, 2024

MilesCranmer commented Dec 6, 2024 •

edited

Loading

felixcremer commented Dec 9, 2024 •

edited

Loading

felixcremer commented Dec 10, 2024

felixcremer commented Dec 12, 2024

lazarusA commented Dec 12, 2024

meggart commented Dec 13, 2024

felixcremer commented Dec 13, 2024

felixcremer commented Dec 13, 2024

Running rqatrend seems to accumulate memory #36

Running rqatrend seems to accumulate memory #36

Comments

felixcremer commented Dec 1, 2024

felixcremer commented Dec 2, 2024

felixcremer commented Dec 2, 2024

felixcremer commented Dec 3, 2024

danlooo commented Dec 4, 2024

felixcremer commented Dec 6, 2024

felixcremer commented Dec 6, 2024

MilesCranmer commented Dec 6, 2024 • edited Loading

felixcremer commented Dec 9, 2024 • edited Loading

felixcremer commented Dec 10, 2024

felixcremer commented Dec 12, 2024

lazarusA commented Dec 12, 2024

meggart commented Dec 13, 2024

felixcremer commented Dec 13, 2024

felixcremer commented Dec 13, 2024

MilesCranmer commented Dec 6, 2024 •

edited

Loading

felixcremer commented Dec 9, 2024 •

edited

Loading