-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rudimentary GPU load balancing #89
Comments
I just started an Xserver server process, but the Xorg process run over the two GPU, I consider that the parameter -d, should automatically choose the GPU less overloaded depending on the percentage of use gpu and memory percentage (utilization.gpu, utilization.memory) from nvidia-smi query. e.g
|
I've only been able to do this with NV-CONTROL ( For example with their sample here:
|
That would be fine if we knew that all GPUs were attached to display :0, but we can’t assume that (some deployments attach them to separate X displays.) The purpose here is to divine the appropriate value of |
I thought that once the least-loaded GPU is identified, parsing the command line of the associated Then, once you know the display number, the NV-CONTROL method can be used to determine which screen(s) of the identified display are on the GPU in question. For example, the following code works for me (relying on the # find GPU and PID of least-utilized GPU that has X running
read gpu pid _ < <(nvidia-smi pmon -c 1 | awk '$8 == "X" { print $1,$2,$4 }' \
| sort -n -k3 | head -n1)
# find DISPLAY in command line of X PID
display=$(grep -xz ':[0-9]\+' /proc/$pid/cmdline)
# use NV-CONTROL to find first SCREEN on GPU and DISPLAY identified
screen=$(DISPLAY=$display ./nv-control-dpy --query-gpus \
| sed -n "s/ *Indices of X screens using GPU $gpu: //p" | cut -d' ' -f1)
# set VGL_DISPLAY
VGL_DISPLAY=${display}.${screen} |
This issue is to track an idea for a script-based GPU load balancer that would be invoked by vglrun. Ideally this script would, if
VGL_DISPLAY
/-d
isauto
:nvidia-smi
andamdconfig
to build a list of available GPUs for which load information can be obtained, as well as the load of each GPU.nvidia-smi
can be used to figure out whether any Xorg processes are attached to the least-loaded GPU (not sure whetheramdconfig
can do the same thing), andps
can obtain the command line (and thus the X display number) of the Xorg process, but that doesn't tell me which X screen is attached to which GPU (there can be multiple screens per Xorg process.)VGL_DISPLAY
to the X screen of the least-loaded GPU.This does not depend on #10, but it would need to accommodate that new feature. The idea is that #10 would work the way the old GLP feature used to work on Solaris/SPARC, i.e. it would be activated by specifying a device path rather than an X display in
VGL_DISPLAY
/-d
. Similarly, this feature could be extended by specifyingautoegl
instead ofauto
, thus instructing the script to find an EGL device path instead of an X display. If (2) above proves impossible or unwieldy, then this feature might have to rely on EGL.The text was updated successfully, but these errors were encountered: