-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak during motion correction #216
Comments
Hmm, motion correcting should not hold the entire file in memory. I'll look into a couple of things here, what version of sima and h5py are you using? |
Thanks Jeff, I'm running 1.3.0 for SIMA and 2.6.0 for h5py. Whenever I run files totaling more than 55GB or so on an EC2 instance of 60GB RAM, I get Out of Memory errors. Running top shows that individual python processes per file end up using roughly the size of the file in memory. The only other command at the end of the above set that I run is |
I am trying to debug this in a session I am running right now. Based on top output, the memory use seems to ramp up quickly around the time I see "Estimating displacements for cycle 0" from the verbose output of motion correction. It's still not at the size of the file. So I am now pretty sure the file is not fully getting loaded prior to motion correction and the title of this issue is incorrect. But the memory usage seems to ramping up by the minute in my current session (e.g. for a 7.8 gb file, 4GB of resident memory is being used according to top and for another 4GB file, 2GB of resident memory is already in use). From looking at prior sessions close to the end of motion correction, I have seen approximately the size of the file in memory, which is why I assumed they were getting loaded into memory. I have 32 cores in the instance I am running and have 60GB. So I can technically run 32 parallel processes with no loss of performance but currently, the max number I can reliably run is limited by the size of the files. So with ~8-10GB HDF5 files, I can run 5 without any problems but with 6, I sometimes get out of memory errors before motion correction is completed. So overall, now I am actually worried there might be a memory leak going on considering the slow ramp up in memory usage. Thanks again for your help! |
One final update. The program I ran is close to completion. I ran two parallel processes as mentioned above. One of them is already done executing and used the size of the file in RAM right before completion. The second python process motion correcting the ~8GB file is still running and is now occupying 8 GB of RAM. This has all the signs of a memory leak: constant build up over time and sudden clearing once the process exits. I tried reading the motion correction source code but it seems to be fairly large for me to devote sufficient time. So any help in debugging this would be tremendously helpful. Are the local computers you guys use for motion correcting large files fitted with huge amounts of RAM? If not, this would be pretty obvious. So I am wondering if I am doing something incorrectly to cause this leak. I doubt it but am surprised this isn't a more common problem. |
HDF5 files can be chunked in different ways, which can affect how much data you must load at a single time (https://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/). How are you HDF5 files chunked? |
Thanks @pkaifosh I am chunking them into frames. So one chunk is one frame. I only have one plane and channel. So, for 10000 frames, I have 10000 chunks. Is this the problem? If so, how do you think I can improve it? I thought the only cost to having small chunks like this is reduced performance in I/O, rather than memory. |
That sounds like a reasonable way of chunking. As long as the you aren't doing something like chunking by row/column, then things should be fine. Any other clues on what might be causing the problem? |
Yeah, that's what I thought as well. I don't quite know what the issue is, but my guess would be that prior frames aren't cleared from the memory fully, i.e. that some reference to a previously loaded (or corrected) frame is held in memory even after its use. That would explain the slow ramp up of memory usage, reaching the full size of the file just before the end of motion correction. But I haven't really ever taken a look at the motion correction code to figure out where this might be. Do you think something like this might be a possibility? I could start looking over the code as well. I just wanted to see if something obvious came to you guys. |
@jzaremba and I made some progress on this today. Memory is accumulating in the Based on this, increasing your Currently the data in these variables is being stored as @pkaifosh do you have any insights here? Are these variables bounded by anything? Alternatively, are there any obvious alternatives for implementing the beam search that would avoid the memory accumulation? |
Thanks, Nathan and Jeff! I'm sure that's the reason for the memory accumulation. I have a simple suggestion that I believe should fix it. Can you not write the two lists on disk into HDF5 files? You should be able to chunk it since you know the size of the final array right from the start, if I understand the code correctly. It seems there's not much real time manipulation of the two lists in that function, other than calling the last value a couple of times. The disk I/O performance will take a hit but the reduced consumption of RAM might compensate for it even in terms of processing speed. Would this work? Or did I completely misunderstand it? Also, would you guys be able to help me understand those variables? My initial guess was that you are calculating different possible x and y translations and their posterior probabilities in the function with |
Btw, congrats on the Neuron paper! |
These variables are for doing the backward pass of the beam search. Once you find the most probable termination state, then you go backward and find the most probable way of having got there, and then you keep going one step backward at a time. There are a number of approaches for reducing the memory.
|
Hi guys,
I have been using a code similar to below for motion correction.
I've also done the same thing by directly calling the
sequences
variable in motion correction, rather than the imaging dataset. From the source code, it looked like sequences is an iterator over each frame. So I thought the entire dataset needn't be loaded into memory all at once.This is a problem as I've been running pretty large files on EC2 (~15-20GB) and the RAM is limiting the number of files I can run in parallel on a single instance. I haven't tried running these on my local machine on which I wouldn't even be able to load them into memory.
Is there something simple that I am overlooking in the code?
Thanks!
Vijay
The text was updated successfully, but these errors were encountered: