Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of NVIDIA drivers with issues #41

Closed
maxpain opened this issue Jul 25, 2023 · 25 comments
Closed

List of NVIDIA drivers with issues #41

maxpain opened this issue Jul 25, 2023 · 25 comments

Comments

@maxpain
Copy link

maxpain commented Jul 25, 2023

Hello. I'm trying to run this container in my home Kubernetes cluster on Talos Linux with RTX4090 GPU.
Nvidia driver: 535.86.05

root@csgo-0:/tmp# cat /home/user/.local/share/xorg/Xorg.0.log
[  3108.301] _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
[  3108.301] 
X.Org X Server 1.21.1.3
X Protocol Version 11, Revision 0
[  3108.301] Current Operating System: Linux csgo-0 6.1.35-talos #1 SMP PREEMPT_DYNAMIC Wed Jun 28 13:58:51 UTC 2023 x86_64
[  3108.301] Kernel command line: talos.platform=metal talos.config=none console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 mitigations=off cpufreq.default_governor=performance
[  3108.301] xorg-server 2:21.1.3-2ubuntu2.5 (For technical support please see http://www.ubuntu.com/support) 
[  3108.301] Current version of pixman: 0.40.0
[  3108.301]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  3108.301] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  3108.301] (==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Tue Jul 25 13:36:43 2023
[  3108.301] (==) Using config file: "/etc/X11/xorg.conf"
[  3108.301] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  3108.301] (==) ServerLayout "Layout0"
[  3108.301] (**) |-->Screen "Screen0" (0)
[  3108.301] (**) |   |-->Monitor "Monitor0"
[  3108.301] (**) |   |-->Device "Device0"
[  3108.301] (**) |-->Input Device "Keyboard0"
[  3108.301] (**) |-->Input Device "Mouse0"
[  3108.301] (**) Option "AutoAddGPU" "false"
[  3108.301] (==) Automatically adding devices
[  3108.301] (==) Automatically enabling devices
[  3108.301] (**) Not automatically adding GPU devices
[  3108.301] (==) Automatically binding GPU devices
[  3108.301] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  3108.301] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (==) FontPath set to:
        /usr/share/fonts/X11/misc,
        built-ins
[  3108.301] (==) ModulePath set to "/usr/lib/xorg/modules"
[  3108.301] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  3108.301] (WW) Disabling Keyboard0
[  3108.301] (WW) Disabling Mouse0
[  3108.301] (II) Loader magic: 0x55f31992c020
[  3108.301] (II) Module ABI versions:
[  3108.301]    X.Org ANSI C Emulation: 0.4
[  3108.301]    X.Org Video Driver: 25.2
[  3108.301]    X.Org XInput driver : 24.4
[  3108.301]    X.Org Server Extension : 10.0
[  3108.303] (EE) systemd-logind: failed to get session: Launch helper exited with unknown return code 1
[  3108.303] (II) xfree86: Adding drm device (/dev/dri/card0)
[  3108.303] (II) Platform probe for /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0
[  3108.304] (--) PCI:*(1@0:0:0) 10de:2684:10de:165b rev 161, Mem @ 0x93000000/16777216, 0x4000000000/34359738368, 0x4800000000/33554432, I/O @ 0x00006000/128, BIOS @ 0x????????/524288
[  3108.304] (II) LoadModule: "glx"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  3108.305] (II) Module glx: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org Server Extension, version 10.0
[  3108.305] (II) LoadModule: "nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  3108.305] (II) Module nvidia: vendor="NVIDIA Corporation"
[  3108.305]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.305]    Module class: X.Org Video Driver
[  3108.305] (II) NVIDIA dlloader X Driver  535.86.05  Fri Jul 14 20:26:08 UTC 2023
[  3108.305] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  3108.305] (II) Loading sub module "fb"
[  3108.305] (II) LoadModule: "fb"
[  3108.305] (II) Module "fb" already built-in
[  3108.305] (II) Loading sub module "wfb"
[  3108.305] (II) LoadModule: "wfb"
[  3108.305] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  3108.305] (II) Module wfb: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org ANSI C Emulation, version 0.4
[  3108.305] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  3108.305] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  3108.305] (==) NVIDIA(0): RGB weight 888
[  3108.305] (==) NVIDIA(0): Default visual is TrueColor
[  3108.305] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  3108.305] (**) NVIDIA(0): Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
[  3108.305] (**) NVIDIA(0): Option "ProbeAllGpus" "False"
[  3108.305] (**) NVIDIA(0): Option "BaseMosaic" "False"
[  3108.305] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration" "True"
[  3108.305] (**) NVIDIA(0): Option "HardDPMS" "False"
[  3108.305] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP"
[  3108.305] (**) NVIDIA(0): Enabling 2D acceleration
[  3108.305] (**) NVIDIA(0): ConnectedMonitor string: "DFP"
[  3108.305] (II) Loading sub module "glxserver_nvidia"
[  3108.305] (II) LoadModule: "glxserver_nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[  3108.309] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[  3108.309]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.309]    Module class: X.Org Server Extension
[  3108.309] (II) NVIDIA GLX Module  535.86.05  Fri Jul 14 20:27:17 UTC 2023
[  3108.309] (II) NVIDIA: The X server supports PRIME Render Offload.
[  3108.322] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
[  3108.322] (--) NVIDIA(0):     DFP-0 (boot)
[  3108.322] (--) NVIDIA(0):     DFP-1
[  3108.322] (--) NVIDIA(0):     DFP-2
[  3108.322] (--) NVIDIA(0):     DFP-3
[  3108.322] (--) NVIDIA(0):     DFP-4
[  3108.322] (--) NVIDIA(0):     DFP-5
[  3108.322] (--) NVIDIA(0):     DFP-6
[  3108.322] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
[  3108.322] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 4090 (AD102-A) at PCI:1:0:0
[  3108.322] (II) NVIDIA(0):     (GPU-0)
[  3108.322] (--) NVIDIA(0): Memory: 25153536 kBytes
[  3108.322] (--) NVIDIA(0): VideoBIOS: 95.02.20.00.01
[  3108.322] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): connected
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): 600.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (**) NVIDIA(GPU-0): Mode Validation Overrides for LNX PiKVM (DFP-0):
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVirtualSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoHorizSyncCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVertRefreshCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoExtendedGpuCapabilitiesCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoTotalSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDualLinkDVICheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDisplayPortBandwidthCheck
[  3108.365] (**) NVIDIA(GPU-0):     AllowNon3DVisionModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonEdidModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonHDMI3DModes
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidHDMI2Check
[  3108.365] (**) NVIDIA(GPU-0):     AllowDpInterlaced
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add conservative default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add "nvidia-auto-select" mode to ModePool.
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:1920x1080R"; removing.
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): Unable to validate any modes; falling back to the default mode
[  3108.366] (WW) NVIDIA(0):     "nvidia-auto-select".
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:nvidia-auto-select"; removing.
[  3108.366] (EE) NVIDIA(0): Unable to use default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(0): Failing initialization of X screen
[  3108.427] (II) UnloadModule: "nvidia"
[  3108.427] (II) UnloadSubModule: "glxserver_nvidia"
[  3108.427] (II) Unloading glxserver_nvidia
[  3108.427] (II) UnloadSubModule: "wfb"
[  3108.427] (EE) Screen(s) found, but none have a usable configuration.
[  3108.427] (EE) 
Fatal server error:
[  3108.427] (EE) no screens found(EE) 
[  3108.427] (EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
[  3108.427] (EE) Please also check the log file at "/home/user/.local/share/xorg/Xorg.0.log" for additional information.
[  3108.427] (EE) 
[  3108.427] (EE) Server terminated with error (1). Closing log file.
@maxpain
Copy link
Author

maxpain commented Jul 25, 2023

@ehfd could you help me, please?

@ehfd
Copy link
Member

ehfd commented Jul 25, 2023

Upgrade your driver. Use the latest minor release of each major release if you are in the 535 or 550 branch. Versions earlier than 535.113.01 or 550.67 have bugs.

@maxpain
Copy link
Author

maxpain commented Jul 25, 2023

VIDEO_PORT to DP-0 perhaps.

I tried DP-O, DP-1, DP-2, DFP.
Only "none" works.

@ehfd
Copy link
Member

ehfd commented Jul 26, 2023

VIDEO_PORT to DP-0 perhaps.

"none" is not optimal. What's your environment?

@maxpain
Copy link
Author

maxpain commented Jul 26, 2023

@ehfd Kubernetes cluster, nvidia-container-toolkit, NVIDIA device plugin, Talos Linux, RTX4090 with 535.86.05 nvidia driver.

@ehfd
Copy link
Member

ehfd commented Aug 1, 2023

Similar issue with egl desktop. Perhaps an issue with driver 535.

@maxpain
Copy link
Author

maxpain commented Aug 1, 2023

@ehfd Hmm, I don't have any issues with EGL desktop on 535.

@maxpain
Copy link
Author

maxpain commented Aug 1, 2023

Mostly because we use Xvfb in EGL desktop variant, not Xorg.

@ehfd
Copy link
Member

ehfd commented Aug 1, 2023

I reproduce the error... Immediate directive is NOT to upgrade to NVIDIA 535, yet.

@ehfd ehfd changed the title no screens found no screens found on NVIDIA 535.86 Aug 1, 2023
@ehfd
Copy link
Member

ehfd commented Aug 1, 2023

In NVIDIA 535.86.05 with Option "ModeDebug" "True" inserted in /etc/X11/xorg.conf for debugging:
GPU extended capability check failed. is the key message.

[  2711.450] (II) NVIDIA(GPU-0): --- Building ModePool for DFP-1 ---
[  2711.450] (**) NVIDIA(GPU-0): Mode Validation Overrides for DFP-1:
[  2711.450] (**) NVIDIA(GPU-0):     NoMaxSizeCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoVirtualSizeCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoMaxPClkCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoEdidMaxPClkCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoHorizSyncCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoVertRefreshCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoExtendedGpuCapabilitiesCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoTotalSizeCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoDualLinkDVICheck
[  2711.451] (**) NVIDIA(GPU-0):     NoDisplayPortBandwidthCheck
[  2711.451] (**) NVIDIA(GPU-0):     AllowNon3DVisionModes
[  2711.451] (**) NVIDIA(GPU-0):     AllowNonEdidModes
[  2711.451] (**) NVIDIA(GPU-0):     AllowNonHDMI3DModes
[  2711.451] (**) NVIDIA(GPU-0):     NoEdidHDMI2Check
[  2711.451] (**) NVIDIA(GPU-0):     AllowDpInterlaced
(OMITTED)
[  2711.454] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1080_60":
[  2711.454] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[  2711.454] (WW) NVIDIA(GPU-0):     1920 x 1080 @ 60 Hz
[  2711.454] (WW) NVIDIA(GPU-0):       Pixel Clock      : 138.50 MHz
[  2711.454] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.454] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.454] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1080, 1083
[  2711.454] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1088, 1111
[  2711.454] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.454] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.454] (WW) NVIDIA(GPU-0):     Viewport
[  2711.454] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.454] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.454] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.454] (WW) NVIDIA(GPU-0):     Mode "1920x1080_60" is invalid.
[  2711.470] (WW) NVIDIA(GPU-0):   Validating Mode "1280x800_60":
[  2711.470] (WW) NVIDIA(GPU-0):     Mode Source: X Server
[  2711.470] (WW) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[  2711.470] (WW) NVIDIA(GPU-0):       Pixel Clock      : 71.00 MHz
[  2711.470] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1328
[  2711.470] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1360, 1440
[  2711.470] (WW) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  803
[  2711.470] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal :  809,  823
[  2711.470] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.470] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.470] (WW) NVIDIA(GPU-0):     Viewport
[  2711.470] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.470] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.470] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.470] (WW) NVIDIA(GPU-0):     Mode "1280x800_60" is invalid.
[  2711.471] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1200_60":
[  2711.471] (WW) NVIDIA(GPU-0):     Mode Source: X Server
[  2711.471] (WW) NVIDIA(GPU-0):     1920 x 1200 @ 60 Hz
[  2711.471] (WW) NVIDIA(GPU-0):       Pixel Clock      : 154.00 MHz
[  2711.471] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.471] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.471] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1200, 1203
[  2711.471] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1209, 1235
[  2711.471] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.471] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.471] (WW) NVIDIA(GPU-0):     Viewport
[  2711.471] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.471] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.471] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.471] (WW) NVIDIA(GPU-0):     Mode "1920x1200_60" is invalid.
[  2711.472] (WW) NVIDIA(GPU-0):   Validating Mode "800x600_60":
[  2711.472] (WW) NVIDIA(GPU-0):     Mode Source: NVIDIA Predefined
[  2711.472] (WW) NVIDIA(GPU-0):     800 x 600 @ 60 Hz
[  2711.472] (WW) NVIDIA(GPU-0):       Pixel Clock      : 40.00 MHz
[  2711.472] (WW) NVIDIA(GPU-0):       HRes, HSyncStart :  800,  840
[  2711.472] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal :  968, 1056
[  2711.472] (WW) NVIDIA(GPU-0):       VRes, VSyncStart :  600,  601
[  2711.472] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal :  605,  628
[  2711.472] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H +V
[  2711.472] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.472] (WW) NVIDIA(GPU-0):     Viewport
[  2711.472] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.472] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.472] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.472] (WW) NVIDIA(GPU-0):     Mode "800x600_60" is invalid.
[  2711.472] (WW) NVIDIA(GPU-0):
[  2711.472] (EE) NVIDIA(GPU-0): Unable to add conservative default mode "nvidia-auto-select".
[  2711.472] (EE) NVIDIA(GPU-0): Unable to add "nvidia-auto-select" mode to ModePool.
[  2711.472] (WW) NVIDIA(0): No valid modes for "DFP-1:1920x1080R"; removing.
[  2711.472] (WW) NVIDIA(0):
[  2711.472] (WW) NVIDIA(0): Unable to validate any modes; falling back to the default mode
[  2711.472] (WW) NVIDIA(0):     "nvidia-auto-select".
[  2711.472] (WW) NVIDIA(0):
[  2711.472] (WW) NVIDIA(0): No valid modes for "DFP-1:nvidia-auto-select"; removing.
[  2711.472] (EE) NVIDIA(0): Unable to use default mode "nvidia-auto-select".
[  2711.472] (EE) NVIDIA(0): Failing initialization of X screen

Xorg.0.log
xorg.conf.log

@ehfd
Copy link
Member

ehfd commented Aug 1, 2023

Works up to 530.41.03.

X.Org X Server 1.21.1.4
X Protocol Version 11, Revision 0
Current Operating System: Linux xgl-test 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-153-generic root=UUID=b74b4d9b-e7b1-4dc6-be2e-bf94365e04ed ro maybe-ubiquity
xorg-server 2:21.1.4-2ubuntu1.7~22.04.1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.40.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Wed Aug  2 04:39:32 2023
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "2560x1600_60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     2560 x 1600 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 268.50 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2608
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2640, 2720
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart : 1600, 1603
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1609, 1646
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 2560x1600+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        1
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          1
[746936.617] (II) NVIDIA(GPU-0):     Mode "2560x1600_60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "1280x800d60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 134.25 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1304
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1320, 1360
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  801
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal :  804,  823
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[746936.617] (II) NVIDIA(GPU-0):       Extra            : DoubleScan
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 1280x800+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        2
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          2
[746936.617] (II) NVIDIA(GPU-0):     Mode "1280x800d60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "2560x1600_60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     2560 x 1600 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 348.50 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2760
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 3032, 3504
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart : 1600, 1603
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1609, 1658
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 2560x1600+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        1
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          1
[746936.617] (II) NVIDIA(GPU-0):     Mode "2560x1600_60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "1280x800d60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 174.25 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1380
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1516, 1752
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  801
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal :  804,  829
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[746936.617] (II) NVIDIA(GPU-0):       Extra            : DoubleScan
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 1280x800+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        2
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          2
[746936.617] (II) NVIDIA(GPU-0):     Mode "1280x800d60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.618] (II) NVIDIA(GPU-0): --- Done building ModePool for DFP-2 ---
[746936.618] (II) NVIDIA(GPU-0):
[746936.618] (II) NVIDIA(GPU-0): Frequency information for DFP-2:
[746936.618] (II) NVIDIA(GPU-0):   HorizSync   : 28.000-55.000 kHz
[746936.618] (II) NVIDIA(GPU-0):   VertRefresh : 43.000-72.000 Hz
[746936.618] (II) NVIDIA(GPU-0):     (HorizSync from Conservative Defaults)
[746936.618] (II) NVIDIA(GPU-0):     (VertRefresh from Conservative Defaults)

And in 525.60.13.

X.Org X Server 1.21.1.4
X Protocol Version 11, Revision 0
Current Operating System: Linux xgl-test 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-148-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro
xorg-server 2:21.1.4-2ubuntu1.7~22.04.1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.40.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Wed Aug  2 04:44:04 2023
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"

@ehfd
Copy link
Member

ehfd commented Aug 2, 2023

I've emailed the NVIDIA driver team. Waiting for response.

@ehfd
Copy link
Member

ehfd commented Aug 4, 2023

TO OUR USERS:

Please send an email to [email protected] that you are a user of https://github.com/selkies-project/docker-nvidia-glx-desktop and that you are also affected by the below issue. This is the only way to accelerate the bug fix in the drivers, and if this issue is not fixed, this repository may not be usable on later drivers.

We have reproduced an issue that all of our users using the 535.86.05 drivers have also faced, where the "NoExtendedGpuCapabilitiesCheck" option in "ModeValidation" for xorg.conf is not honored in GeForce GPUs.

This is a new issue that has arised which did not exist in 530.xx, 525.xx, and any other earlier drivers, and is reproducible in every user using headless setups in GeForce (so far, all of 10xx, 20xx, and 30xx GPUs).

How to reproduce: In a using port with no monitor connected for ConnectedMonitor (e.g. DP-0) to enable XRandR, and use Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced" to have the Modes pass the tests.

We have also turned on Option "ModeDebug" "True" for debugging.

Result:

[  2711.454] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1080_60":
[  2711.454] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[  2711.454] (WW) NVIDIA(GPU-0):     1920 x 1080 @ 60 Hz
[  2711.454] (WW) NVIDIA(GPU-0):       Pixel Clock      : 138.50 MHz
[  2711.454] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.454] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.454] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1080, 1083
[  2711.454] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1088, 1111
[  2711.454] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.454] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.454] (WW) NVIDIA(GPU-0):     Viewport
[  2711.454] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.454] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.454] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.454] (WW) NVIDIA(GPU-0):     Mode "1920x1080_60" is invalid.
[  2711.454] (WW) NVIDIA(GPU-0):

This is a behavior which does not coincide with the README documentation, and therefore has to be fixed.

------

On a separate note, there is a separate issue which is not a blocking issue (existed long before NVIDIA 535 drivers), where the HDMI or DVI (including the virtual DVI ports in supported Tesla/Datacenter GPUs where the maximum resolution is stuck at a maximum of 2560 x 1600 at 60 hz) ports are stuck at 165.0 MHz maximum pixel clock, and the "NoMaxPClkCheck" "ModeValidation" and related options are never honored. This makes headless GPUs with a "ConnectedMonitor" option on an HDMI or DVI port not able to use Modes above 1920x1200 at 60 hz resolutions.

[2363014.704] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[2363014.704] (--) NVIDIA(0):     DFP-0
[2363014.704] (--) NVIDIA(0):     DFP-1
[2363014.704] (--) NVIDIA(0):     DFP-2
[2363014.704] (--) NVIDIA(0):     DFP-3
[2363014.704] (--) NVIDIA(0):     DFP-4
[2363014.704] (--) NVIDIA(0):     DFP-5
[2363014.705] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
[2363014.707] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 3090 (GA102-A) at PCI:33:0:0
[2363014.707] (II) NVIDIA(0):     (GPU-0)
[2363014.707] (--) NVIDIA(0): Memory: 25165824 kBytes
[2363014.707] (--) NVIDIA(0): VideoBIOS: 94.02.42.40.34
[2363014.707] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[2363014.711] (--) NVIDIA(GPU-0): DFP-0: connected
[2363014.711] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[2363014.711] (--) NVIDIA(GPU-0): DFP-0 Name Aliases:
[2363014.711] (--) NVIDIA(GPU-0):   DFP
[2363014.711] (--) NVIDIA(GPU-0):   DFP-0
[2363014.711] (--) NVIDIA(GPU-0):   DPY-0
[2363014.711] (--) NVIDIA(GPU-0):   HDMI-0
[2363014.712] (--) NVIDIA(GPU-0):   HDMI-0
[2363014.712] (--) NVIDIA(GPU-0):   Connector-3
[2363014.712] (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
[2363014.712] (--) NVIDIA(GPU-0):

[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1440_60":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0):     1920 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 234.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 2048
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2256, 2600
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "1920x1440_60" is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1440_75":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0):     1920 x 1440 @ 75 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 297.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 2064
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2288, 2640
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "1920x1440_75" is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "2560x1440_60":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[2363014.714] (WW) NVIDIA(GPU-0):     2560 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 241.50 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2608
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2640, 2720
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1443
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1448, 1481
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "2560x1440_60" is invalid.

This separate note also does not coincide with the README documentation, this time originating way before the 535.xx drivers.

@ehfd ehfd changed the title no screens found on NVIDIA 535.86 NVIDIA 535.86 doesn't support running headless Xorg servers Aug 4, 2023
@ehfd ehfd changed the title NVIDIA 535.86 doesn't support running headless Xorg servers NVIDIA 535.86 doesn't run headless Xorg servers Aug 4, 2023
@ehfd
Copy link
Member

ehfd commented Aug 23, 2023

@maxpain https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/27131

Could you (as well as everyone else affected) provide a nvidia-bug-report.log.gz after facing the error when running Xorg, either here or the NVIDIA forum post above?

As many people as possible is good.

@ehfd
Copy link
Member

ehfd commented Aug 31, 2023

NVIDIA has added this issue to their internal tracker.

@ehfd
Copy link
Member

ehfd commented Sep 4, 2023

From @xhejtman in the Discord:

what is the issue with nvidia drivers and no resolution available? We just tested 535 drivers on A10 gpu and it gets all resolutions available. Is that desktop card specific?

Perhaps it could be, or the new driver release fixed things. CC @maxpain

@ehfd
Copy link
Member

ehfd commented Oct 12, 2023

Good news: NVIDIA said they found the source of the issue and they will ship the fix in the next release.
Now, we have to pray that all of the issues have indeed been properly fixed.

@bongole
Copy link

bongole commented Nov 2, 2023

Maybe this issue was fixed in 535.129.03 and 545.29.02.
I tested the drivers on Ubuntu 22.04 with RTX 4060 Ti.

@ehfd
Copy link
Member

ehfd commented Nov 2, 2023

Release highlights since 535.113.01:
Fixed a bug that could cause modes to fail validation when Option "ModeValidation" "NoExtendedGpuCapabilitiesCheck" is specified in xorg.conf.
Fixed a bug that could cause GPU memory utilization to be reported incorrectly for Multi-Instance GPU (MIG) partitions on Grace Hopper systems.
Fixed a bug that intermittently caused the display to freeze when resuming from suspend on some Ada GPUs.
Fixed a bug which could cause some DisplayPort monitors to flicker.
Fixed a bug that could cause monitors to flicker when the performance state changes on Turing GPUs.

Release highlights since 535.113.01:

Added experimental HDMI 10 bits per component support; enable by loading nvidia-modeset with hdmi_deepcolor=1.
Added support for the CTM, DEGAMMA_LUT, and GAMMA_LUT DRM-KMS CRTC properties. These are used by features such as the “Night Light” feature in GNOME and the “Night Color” feature in KDE, when they are used as Wayland compositors.
Added support for GeForce and Workstation GPUs to the open kernel modules. Please see the “Open Linux Kernel Modules” chapter in the README for details.
Added initial experimental support for runtime D3 (RTD3) power management on Desktop GPUs. Please see the ‘PCI-Express Runtime D3 (RTD3) Power Management’ chapter in the README for more details.
Added support for the EGL_ANDROID_native_fence_sync EGL extension and the VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT and VK_EXTERNAL_FENCE_HANDLE_TYPE_SYNC_FD_BIT Vulkan external handle types when the nvidia-drm kernel module is loaded with the modeset=1 parameter.
Added experimental support for framebuffer consoles provided by nvidia-drm. On kernels that implement drm_fbdev_generic_setup and drm_aperture_remove_conflicting_pci_framebuffers, nvidia-drm will install a framebuffer console when loaded with both modeset=1 and fbdev=1 kernel module parameters. This will replace the Linux boot console driven by a system framebuffer driver such as efifb or vesafb.
Note that when an nvidia-drm framebuffer console is enabled, unloading nvidia-drm will cause the screen to turn off.
Updated nvidia-installer to allow installing the driver while an existing NVIDIA driver is already loaded.
Added support for virtual reality displays, such as the SteamVR platform, on Wayland compositors that support DRM leasing. Support requires xwayland version 22.1.0 and wayland-protocols version 1.22, or later. Tested on sway, minimum version 1.7 with wlroots version 0.15, and also on Kwin, minimum version 5.24.
Note: Before xwayland 23.2, there is a known issue with HDMI displays where the headset will fail to start a second time after closing SteamVR. This can be worked around by unplugging and replugging in the headset.
Fixed a bug that prevented VRR (Variable Refresh Rate) from working with Wayland.
Added support to the NVIDIA VDPAU driver for running in Xwayland. Please refer to the “Xwayland support in VDPAU” section of the README for further details.
Added libnvidia-gpucomp.so to the driver package. This is a helper library used for GPU shader compilation.
Removed libnvidia-vulkan-producer.so from the driver package. This helper library is no longer needed by the Wayland WSI.
Fixed a bug that intermittently caused the display to freeze when resuming from suspend on some Ada GPUs.
Fixed a bug that could cause monitors to flicker when the performance state changes on Turing GPUs.
Added support for HDR signaling via the HDR_OUTPUT_METADATA and Colorspace per-connector DRM properties when nvidia-drm is loaded with the modeset=1 parameter.
Added support for PRIME render offload to Vulkan Wayland WSI.
Fixed a bug that could cause modes to fail validation when Option "ModeValidation" "NoExtendedGpuCapabilitiesCheck" is specified in xorg.conf.
Fixed a bug which could cause some DisplayPort monitors to flicker.

It seems to be the case @bongole. I will check if all edge cases were addressed.

@ehfd ehfd changed the title NVIDIA 535.86 doesn't run headless Xorg servers NVIDIA 535.86 doesn't run headless Xorg servers (apparently fixed in 535.129.03 and 545.29.02) Nov 2, 2023
@ehfd
Copy link
Member

ehfd commented Nov 7, 2023

@bongole What's the environment that made it work? Is it this container?

@bongole
Copy link

bongole commented Nov 9, 2023

@ehfd

I tested below command on bare metal Ubuntu-22.04 server with RTX 4060 Ti.

docker run --gpus all -it --rm --tmpfs /dev/shm:rw -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -e ENABLE_HTTPS_WEB=true --network host ghcr.io/selkies-project/nvidia-glx-desktop:latest

OS Info:

$ uname -a
Linux gpu-server 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ nvidia-smi
Thu Nov  9 11:15:09 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off | 00000000:01:00.0 Off |                  N/A |
| 32%   29C    P0              29W / 165W |      4MiB / 16380MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@ehfd
Copy link
Member

ehfd commented Nov 9, 2023

I cannot confirm on 535.129.03 because my testing node is currently broken. Information regarding this is appreciated.

@ehfd ehfd pinned this issue Nov 20, 2023
@ehfd ehfd changed the title NVIDIA 535.86 doesn't run headless Xorg servers (apparently fixed in 535.129.03 and 545.29.02) NVIDIA 535.86 doesn't run headless Xorg servers (fixed in 535.129.03 and 545.29.02) Nov 20, 2023
@ehfd
Copy link
Member

ehfd commented Nov 20, 2023

image
image
image

A kind user has also confirmed with version 535.129.03 for me.
Issue resolved.

Conclusion: if you face this issue, Use Display Driver Versions >= 535.129.03 or 545.29.02, or <= 530.xx. Don't use headless drivers because they lack certain libraries.

@ehfd ehfd closed this as completed Nov 20, 2023
@ehfd ehfd changed the title NVIDIA 535.86 doesn't run headless Xorg servers (fixed in 535.129.03 and 545.29.02) List of NVIDIA drivers with issues May 10, 2024
@ehfd
Copy link
Member

ehfd commented May 10, 2024

NVIDIA 550 drivers <= 550.5x have issues with Vulkan. Use 550.67 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants