Does time-slicing or MPS GPU-sharing supports a mode for processe to exclusively use GPU DRAM? #966

so2bin · 2024-09-25T12:03:26Z

Currently, with time-slicing or MPS GPU-sharing technology, multiple processes simultaneously occupy GPU memory, preventing a single process from utilizing all the memory. Is there any technology or configuration that allows these GPU-sharing modes to swap the memory occupied by processes to host-memory when they are not using the GPU? This way, process that is running on the GPU can utilize all the memory.
I want to achieve a scenario where N GPUs can be shared by M developers' containers, generally with M>=N. However, the M developers will not use the GPU simultaneously and will only use it intermittently. I hope that developers will only occupy GPU memory when they need the GPU. Even if the debugging process has not ended, it should not occupy GPU memory when the GPU is not needed. This way, the memory can be freed up for other users. Can the current GPU-sharing technology support this implementation?

Provide feedback