GPU 0 is always used in a multi-GPU setup #139

nikolai-franke · 2023-10-19T06:03:28Z

System:

OS version: Red Hat Enterprise Linux (RHEL) 8.x
Python version: Python 3.10 and Python 3.9
SAPIEN version: sapien==2.2.2
Environment: Server with xvfb

Describe the bug
SAPIEN always uses GPU 0 in multi-GPU setup in addition to the GPU specified by CUDA_VISIBLE_DEVICES

To Reproduce

Run modified examples/robotics/basic_robot.py script (the only difference is that there is no Viewer) https://pastebin.com/abuJeuVG with CUDA_VISIBLE_DEVICES=0
Run modified examples/robotics/basic_robot.py script (the only difference is that there is no Viewer) https://pastebin.com/abuJeuVG with CUDA_VISIBLE_DEVICES=1

Expected behavior
Checking the GPU usage, only the selected GPU should be used. For CUDA_VISIBLE_DEVICES=0, that is the case. For CUDA_VISIBLE_DEVICES=1, both GPU 0 and GPU 1 get used.

Screenshots
CUDA_VISIBLE_DEVICES=0:

CUDA_VISIBLE_DEVICES=1:

Additional context
Even though GPU 0 only gets used a bit when CUDA_VISIBLE_DEVICES=1, this usage quickly adds up when running many parallel simulations. I am using ManiSkill2 for Reinforcement Learning on an HPC node with 4 Nvidia A100 GPUs and this bug severely limits the number of parallel environments I can run. Additionally, running many parallel environments becomes slow, since GPU 0 is used by every single simulation environment instead of just 1/4th of the simulations.

The text was updated successfully, but these errors were encountered:

fbxiang · 2023-10-23T22:15:38Z

You may try passing offscreen_only=True to SapienRenderer constructor. This behavior will be changed in the future (to make CUDA device take higher priority than on-screen rendering)

nikolai-franke · 2023-10-25T05:03:21Z

Passing offscreen_only=True doesn't make a difference.

fbxiang · 2023-11-11T21:40:19Z

I cannot figure out what is causing the issue. I think you should set the pci id of the device you want to use directly. This method requires a bit setup but should never fail. First, before creating anything with SAPIEN, run sapien.SapienRenderer.set_log_level("info"). Next, run your code. You will see a table listing devices visible to Vulkan. From there, you will see all your GPUs with a field PciBus. The PciBus is unique to each of your physical GPU. Next when you create SapienRenderer, you can pass device="pci:x" where x is the PciBus id shown in the log. This should bypass all other checks.

nikolai-franke · 2023-11-12T10:00:42Z

Thank you very much for your answer! Sadly the result is still exactly the same. GPU 0 always gets used, even when selecting another GPU via PCI address.

fbxiang · 2023-11-22T05:48:35Z

Are you using sapien==2.2.2? I have verified that the GPU selection feature is working. You can try sapien.SapienRenderer.set_log_level("info") before creating the renderer. It will list all available GPUs to the console and tell you which GPU is selected for rendering. Since an incorrect pci id will result in an error, I guess that maybe some other program is running on your GPU 0 and it is not SAPIEN renderer.

balazsgyenes · 2023-12-21T11:25:53Z

I'm actually having the same issue.

XYZ-99 mentioned this issue Aug 5, 2024

Setting device for SapienRenderer doesn't work #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU 0 is always used in a multi-GPU setup #139

GPU 0 is always used in a multi-GPU setup #139

nikolai-franke commented Oct 19, 2023 •

edited

Loading

fbxiang commented Oct 23, 2023

nikolai-franke commented Oct 25, 2023

fbxiang commented Nov 11, 2023

nikolai-franke commented Nov 12, 2023

fbxiang commented Nov 22, 2023

balazsgyenes commented Dec 21, 2023

GPU 0 is always used in a multi-GPU setup #139

GPU 0 is always used in a multi-GPU setup #139

Comments

nikolai-franke commented Oct 19, 2023 • edited Loading

fbxiang commented Oct 23, 2023

nikolai-franke commented Oct 25, 2023

fbxiang commented Nov 11, 2023

nikolai-franke commented Nov 12, 2023

fbxiang commented Nov 22, 2023

balazsgyenes commented Dec 21, 2023

nikolai-franke commented Oct 19, 2023 •

edited

Loading