You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating #192, I noticed that --tracer.raymarch-type voxel triggers OutOfMemoryError as below
other traceback lines
...
File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 130, in trace
hit_ray_d = rays.dirs.index_select(0, ridx)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.15 GiB (GPU 0; 11.69 GiB total capacity; 10.22 GiB already allocated; 133.44 MiB free; 10.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
❯ nvidia-smi
Sat Jun 29 01:30:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti Off | 00000000:01:00.0 On | N/A |
| 0% 40C P8 14W / 285W | 848MiB / 12282MiB | 41% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1750 G /usr/lib/xorg/Xorg 416MiB |
| 0 N/A N/A 1943 C+G ...libexec/gnome-remote-desktop-daemon 195MiB |
| 0 N/A N/A 1995 G /usr/bin/gnome-shell 98MiB |
| 0 N/A N/A 5488 G ...57,262144 --variations-seed-version 109MiB |
| 0 N/A N/A 8436 G /app/bin/wezterm-gui 9MiB |
+-----------------------------------------------------------------------------------------+
As you can see, 4.15 GiB is tried to be allocated while 10.22 GiB are already used. I observed similar results regardless of whether an interactive app is loaded or not. I thought that simply other apps use pretty large VRAM and checked that usage by running nvidia-smi immediately after trying to train a nerf. As you can see, however, the result is less than 1GiB is used. My assumption is a nerf app tries to allocate quite large VRAM sequentially and fails at some point. Does anybody know a potential cause of this issue?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
barikata1984
changed the title
--tracer.raymarch-type voxel uses too much VRAM, triggers OutOfMemoryError--tracer.raymarch-type voxel uses too much VRAM, which triggers OutOfMemoryError
Jun 29, 2024
While investigating #192, I noticed that
--tracer.raymarch-type voxel
triggers OutOfMemoryError as belowAs you can see, 4.15 GiB is tried to be allocated while 10.22 GiB are already used. I observed similar results regardless of whether an interactive app is loaded or not. I thought that simply other apps use pretty large VRAM and checked that usage by running
nvidia-smi
immediately after trying to train a nerf. As you can see, however, the result is less than 1GiB is used. My assumption is a nerf app tries to allocate quite large VRAM sequentially and fails at some point. Does anybody know a potential cause of this issue?Thanks in advance!
The text was updated successfully, but these errors were encountered: