Make it work with Nvidia MIG devices #628

kienerj · 2023-12-20T06:40:38Z

kienerj
Dec 20, 2023

Bigger GPUs like an A100 with the appropriate driver have the possibility to be split into smaller chunks. You can then say use 1 part to tun stable-diffusion, another to run say an LLM like text-generation-webui.

If the driver installed is MIG-capable (multi-instance gpu) even with just one chunk being the entire GPU, you need to change how you tell CUDA which GPU to use,

I'm taking about this part in the docker-compose.yml:

devices:
              - driver: nvidia
                device_ids: ['0']
                capabilities: [compute, utility]

Most notably device_ids: ['0']. This never worked for me. I had to remove the line entirely to make it work. The issue is if you use a MIG enabled driver, you always need to address the gpu in the form of:

<physical gpuid>:<device id>

This is index based starting at zero. if you want to use the first gpu and the first MIG device on that first gpu:

device_ids: ['0:0']

if you want to use the second gpu and the third MIG device on that second gpu:

device_ids: ['1:2']

Note that there is also other ways to identify the MIG device using UUIDs instead of index based but index based seems easier to me.

But for anyone struggling with MIG, this is the solution to properly address this. Same addressing also applies to other tools that use CUDA_VISIBLE_DEVICES instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it work with Nvidia MIG devices #628

{{title}}

Replies: 0 comments

Select a reply

Make it work with Nvidia MIG devices #628

kienerj Dec 20, 2023

Replies: 0 comments

kienerj
Dec 20, 2023