Impractical GPU memory requirements #43

SebastienTs · 2022-07-19T17:13:21Z

While indeed extremely fast, the GPU memory requirement is impractical on my setup: about 8 GB for a 1024x1024x19 image (16-bit) and a tiny 32x32x16 PSF. For images slightly above 1024x1024 (same number of Z slices), I can only run the code on a RTX 3090 (24 GB)!

The problem seems to stem from the FFT CUDA kernel. The error reported is:

tensorflow/stream_executor/cuda/cuda_fft.cc:253] failed to allocate work area.
tensorflow/stream_executor/cuda/cuda_fft.cc:430] Initialize Params: rank: 3 elem_count: 32 input_embed: 32 input_stride: 1 input_distance: 536870912 output_embed: 32 output_stride: 1 output_distance: 536870912 batch_count: 1
tensorflow/stream_executor/cuda/cuda_fft.cc:439] failed to initialize batched cufft plan with customized allocator:

Something is probably not right in the code... anybody knows of a workaround?

eric-czech · 2022-07-21T15:21:54Z

Hey @SebastienTs, there are a number of reasons the memory usage is often way more than you might expect:

The PSF is padded to the size of the image, so it doesn't matter if it's smaller
Tensorflow FFT operations don't support sub 32 bit types (or at least they didn't when this was written)
The image array is copied in intermediate states
Other tensorflow memory overhead (e.g. as observed in Estimation of required memory #32)
Often most importantly, the default padding mode pushes all dimensions up to next power of 2 (so in your case 1024x1024x19 would become 1024x1024x32 for both the image and the PSF).

I would suggest you try pad_mode='2357' which is a more memory-efficient but less computationally-efficient (sometimes) method added in #18.

Apart from that, the only other practical option is to chunk the arrays as in Tile-by-tile deconvolution using dask.ipynb.

SebastienTs · 2022-07-21T16:49:23Z

Thanks a lot for your reply! I had 1, 2 and 5 in mind but even then do you really believe that 3 and 4 could explain the remaining 30x memory overhead (from 270 MB to 8 GB)?

If that is the case I can sleep peacefuly but it sounds like a real lot to me and I want to make sure that something is not misconfigured or extremely suboptimal for the Tensorflow version I am using...

I have not seen any noticeable reduction in memory usage by using pad_mode='2357' when invoking fd_restoration.RichardsonLucyDeconvolver.

I would happily consider the cucim alternative that is recommened but unfortunately my code needs to run on a Windows box.

eric-czech · 2022-07-21T17:41:50Z

Hm well 10x wouldn't surprise me too much but 30x does seem extreme. When it comes to potential TF issues I really have no idea.

You should take a look at this too if you haven't seen it: #42 (comment). Some of those alternatives to this library may be Windows friendly.

joaomamede · 2022-07-26T21:52:53Z

Have a look in my repo:

https://github.com/joaomamede/mamedelab_scripts/blob/main/notebooks/Google2021_Deconvolve_Live_gui.ipynb

I basically use dask to divide the images and assemble them again when the GPU mem is not enough.

This is the bioformats version (older, might have some tweaks to be done)
https://github.com/joaomamede/mamedelab_scripts/blob/main/notebooks/2021deconvolve_live_bioformats.ipynb

They should be able to run on google collaboratory version if you'd like to tweak around.

You also need the libraries at:
https://github.com/joaomamede/mamedelab_scripts/blob/main/notebooks/libraries/deco_libraries.py

hope it helps

I can do 2048x1024 times two in my 6GB laptop.
2048x2048 usually need 12GB vRAM.

The other option is to add the "RAM option" that will share RAM and vRAM and it's still a lot faster than only normal RAM.

SebastienTs changed the title ~~Inpractical GPU memory usage~~ Impractical GPU memory usage Jul 20, 2022

SebastienTs changed the title ~~Impractical GPU memory usage~~ Impractical GPU memory requirements Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impractical GPU memory requirements #43

Impractical GPU memory requirements #43

SebastienTs commented Jul 19, 2022 •

edited

Loading

eric-czech commented Jul 21, 2022

SebastienTs commented Jul 21, 2022 •

edited

Loading

eric-czech commented Jul 21, 2022

joaomamede commented Jul 26, 2022 •

edited

Loading

Impractical GPU memory requirements #43

Impractical GPU memory requirements #43

Comments

SebastienTs commented Jul 19, 2022 • edited Loading

eric-czech commented Jul 21, 2022

SebastienTs commented Jul 21, 2022 • edited Loading

eric-czech commented Jul 21, 2022

joaomamede commented Jul 26, 2022 • edited Loading

SebastienTs commented Jul 19, 2022 •

edited

Loading

SebastienTs commented Jul 21, 2022 •

edited

Loading

joaomamede commented Jul 26, 2022 •

edited

Loading