Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--tracer.raymarch-type voxel uses too much VRAM, which triggers OutOfMemoryError #193

Open
barikata1984 opened this issue Jun 29, 2024 · 0 comments

Comments

@barikata1984
Copy link
Contributor

barikata1984 commented Jun 29, 2024

While investigating #192, I noticed that --tracer.raymarch-type voxel triggers OutOfMemoryError as below

other traceback lines
...
  File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 130, in trace
    hit_ray_d = rays.dirs.index_select(0, ridx)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.15 GiB (GPU 0; 11.69 GiB total capacity; 10.22 GiB already allocated; 133.44 MiB free; 10.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
❯ nvidia-smi
Sat Jun 29 01:30:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   40C    P8             14W /  285W |     848MiB /  12282MiB |     41%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1750      G   /usr/lib/xorg/Xorg                            416MiB |
|    0   N/A  N/A      1943    C+G   ...libexec/gnome-remote-desktop-daemon        195MiB |
|    0   N/A  N/A      1995      G   /usr/bin/gnome-shell                           98MiB |
|    0   N/A  N/A      5488      G   ...57,262144 --variations-seed-version        109MiB |
|    0   N/A  N/A      8436      G   /app/bin/wezterm-gui                            9MiB |
+-----------------------------------------------------------------------------------------+

As you can see, 4.15 GiB is tried to be allocated while 10.22 GiB are already used. I observed similar results regardless of whether an interactive app is loaded or not. I thought that simply other apps use pretty large VRAM and checked that usage by running nvidia-smi immediately after trying to train a nerf. As you can see, however, the result is less than 1GiB is used. My assumption is a nerf app tries to allocate quite large VRAM sequentially and fails at some point. Does anybody know a potential cause of this issue?

Thanks in advance!

@barikata1984 barikata1984 changed the title --tracer.raymarch-type voxel uses too much VRAM, triggers OutOfMemoryError --tracer.raymarch-type voxel uses too much VRAM, which triggers OutOfMemoryError Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant