Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access the GPU without going through an X server #10

Closed
dcommander opened this issue Nov 19, 2015 · 118 comments
Closed

Access the GPU without going through an X server #10

dcommander opened this issue Nov 19, 2015 · 118 comments

Comments

@dcommander
Copy link
Member

dcommander commented Nov 19, 2015

There are spurious rumors that this either already is possible or will be possible soon with the nVidia drivers, by using EGL, but it is unclear exactly how (the Kronos EGL headers still seem to indicate that Xlib is required when using EGL on Un*x.) As soon as it is possible to do this, it would be a great enhancement for VirtualGL, since it would eliminate the need for a running X server on the server machine. I already know basically how to make such a system work in VirtualGL, because Sun used to have a proprietary API (GLP) that allowed us to accomplish the same thing on SPARC. Even as early as 2007, we identified EGL as a possible replacement for GLP, but Linux driver support was only available for it recently, and even where it is available, EGL still seems to be tied to X11 on Un*x systems. It is assumed that, eventually, that will have to change in order to support Wayland.

@tonyhb
Copy link

tonyhb commented Nov 20, 2015

This would be awesome

@dcommander
Copy link
Member Author

This functionality is indeed available in the latest nVidia driver, but I don't have it fully working yet. I can access the GPU device through EGL without an X server, create a Pbuffer, and (seemingly) render something to it, but I can't make glReadPixels() function properly, and I'm a little fuzzy on how double buffering and stereo can be implemented, as it seems like EGL doesn't support double buffered or stereo Pbuffers. Emulating double buffering and stereo using multiple single-buffered Pbuffers is certainly possible, but it would greatly increase the complexity of VirtualGL. Waiting for feedback from nVidia.

@dcommander
Copy link
Member Author

After discussing at length with nVidia, it appears that there are a couple of issues blocking this:

ISSUE:

  • Inability to select a specific GPU device using a device name
    Currently, you can enumerate all of the GPUs in the system (and obtain an EGLDevice handle for each one) by using the EGL_EXT_device_enumeration extension, and you can query each GPU by using the EGL_EXT_device_query extension, and you can obtain an EGLDisplay from one of those EGLDevices by using the EGL_EXT_platform_base and EGL_EXT_platform_device extensions. However, there is no way to obtain any sort of system-wide device name that is unique to a particular EGLDevice. VirtualGL would use this device name in the VGL_DISPLAY environment variable. The idea is that, if EGL is available, VirtualGL will use it by default, and the default value of VGL_DISPLAY will be "{EGL device name for GPU device 0}". Otherwise, if EGL isn't available, then GLX will be the default, and the default value of VGL_DISPLAY will be ":0.0". If the user specifies a particular X display as VGL_DISPLAY, then this would have the effect of forcing the use of GLX mode, even if EGL mode is supported. And setting VGL_DISPLAY to a particular EGL device name would cause all of the 3D rendering to occur on that device.

SOLUTION:

  • nVidia is in the process of implementing the EGL_EXT_device_drm extension, which would allow for selecting a device based on its DRM filename (/dev//dri/card/{n}), and since this is a cross-platform extension, it would allow for selecting non-nVidia GPUs as well.

ISSUE:

  • Lack of support for double-buffered and stereo Pbuffers
    EGL currently supports double buffering only for Window drawables. Pbuffers and Pixmaps are single-buffered. This is a problem for VirtualGL, because as it is currently implemented, VirtualGL assumes a 1:1 mapping between windows on the 2D X server and Pbuffers on the 3D X server. It relies on this 1:1 mapping, because it allows VGL to easily create Pbuffers with visual attributes that match the attributes that the application requests for a particular window. The most straightforward path to EGL support would be if VirtualGL could create EGLSurfaces that inherently support double buffering and quad buffering, so VGL could pass through SwapBuffers() and glDrawBuffer()/glReadBuffer() calls to the EGLSurface corresponding to a particular window. It doesn't appear that this is ever going to be possible with EGL, however.

POSSIBLE SOLUTIONS:

  • Emulating double buffering and stereo using multiple Pbuffers
    Minimally extremely difficult and very prone to error. The problem is that this would require that VirtualGL swap contexts behind the scenes, whenever the application switches the drawing or read buffer. There would no longer be a 1:1 correspondence between OpenGL contexts maintained by VIrtualGL and GLXContext handles that VirtualGL returns to the application. At first glance, it might seem possible to maintain some sort of internal structure in VirtualGL (VGLContext, for instance), whereby each instance of the structure contains the context handles for all of the Pbuffers in question, and a pointer to this structure could be passed back to the application. Normally, VirtualGL just passes back the GLXContext handle from the 3D X server to the application when the application calls glXCreate[New]Context(), so the application is storing the GLXContext on behalf of VirtualGL. Conceivably, VirtualGL could pass back an opaque VGLContext handle to the application instead. My spidey sense tells me that I've already been down that road, and there were some issues associated with that, but I can't remember what they were (these were things that I dealt with very early on in the development of VirtualGL, so perhaps the issues no longer exist.) That aside, however, the show-stopping issue with this approach is whether VirtualGL could straightforwardly copy the relevant properties of one Pbuffer context to the other Pbuffer context whenever the application changes the drawing/read buffer. There doesn't seem to be an EGL equivalent of glXCopyContext(). We would ideally want to limit the touch points of this feature so as to minimize the risk of regression to the existing GLX-based implementation, and changing the internal meaning of GLXContext concerns me. I'd be much more comfortable simply passing an EGLContext back to the application when it requests a GLXContext. Furthermore, this proposal would no longer have a 1:1 mapping between Window handles on the 2D X server and GLXDrawable handles on the 3D X server, although I think it might be possible to maintain a mapping of multiple EGLSurfaces to one Window by using the existing VirtualDrawable::OGLDrawable class.
  • Emulating double buffering and stereo using FBOs
    This would allow a single EGLContext to be mapped to a single GLXContext, which eliminates some of the concerns above. We could maintain a 1:1 mapping between Window handles on the 2D X server and GLXDrawable handles on the 3D X server by using a "dummy" 1x1 Pbuffer and attaching FBOs to it. All of that could be abstracted within the existing VGLDrawable::OGLDrawable class. Swapping buffers could be implemented simply by swapping the FBO attachments. There are issues with this approach, however:
    • Emulation of aux and accumulation buffers. Aux buffers would be easy to emulate using FBOs, since they act as completely independent draw/read buffers, and the application is tasked with transferring pixels into and out of them. Accum buffers would be more difficult. They would require that VirtualGL emulate the glAccum() function, which is likely to be tricky at best. Aux and accum buffers are now an obsolete feature (as of OpenGL 3.1), so it might make sense to say "if you want those features, you have to use the legacy mode-- i.e. GLX mode-- in VirtualGL."
    • Handling stenciling, depth buffers, and other standard OpenGL visual features. These would all have to be maintained as separate FBOs, all attached to the drawable on the 3D X server. The FBOs would have to be created to match the properties of a particular EGLFBConfig (primarily alpha channel, depth bits, and stencil bits.) This isn't impossible, but it does add complexity to the solution, since VirtualGL would have to remember which attributes were used to create a particular EGLFBConfig (within the body of glXChooseVisual() or glXChooseFBConfig()), so that it could create and attach the FBOs appropriately within the body of glXMake[Context]Current().
    • How to handle applications that already do FBO rendering on their own. Care would have to be taken to ensure that the application's FBO attachments do not conflict with the attachments that VirtualGL is making behind the scenes. This would probably require that VirtualGL interpose the functions related to FBO creation and binding so that it could avoid using attachments that the application requests. There is a high likelihood of introducing application-specific compatibility issues here.
    • How to handle the fact that FBOs do not persist across contexts. Although not the common case, there certainly are applications that use multiple OpenGL contexts with the same drawable. VirtualGL would have to detect this case within glXMake[Context]Current() and transfer pixels between the FBOs used by the old context and the FBOs used by the new context, if the two contexts are bound to the same drawable.
  • Emulating double buffering and stereo using the EGL multiview extension
    There exist two extensions (EGL_EXT_multiview_window and EXT_multiview_draw_buffer) that, at least architecturally, would seem to solve the FBO issues described above, since multi-view buffers are attached to a particular drawable, not a particular context, and the multi-view buffers inherent the default visual attributes of the drawable. However, it would be necessary for the driver vendor to implement an equivalent EGL_EXT_multiview_pbuffer extension to allow multiview with Pbuffers and to implement EXT_multiview_draw_buffer using full OpenGL (as opposed to GLES.) I see this approach as being the most straightforward and as providing the lowest possibility of regression, but there is not a good sense that the driver vendors will be willing to implement the necessary extensions just to support us.

@dcommander
Copy link
Member Author

Simple program to demonstrate OpenGL rendering without an X server:
git clone https://gist.github.com/dcommander/ee1247362201552b2532

@dcommander
Copy link
Member Author

dcommander commented Dec 14, 2017

Popping the stack on this old thread, because I've started re-investigating how best to accomplish this, and I've been tinkering with some code over the past few days to explore what's now possible, since it's been two years since I last visited it. AFAICT (awaiting nVidia's confirmation), the situation is still the same with respect to EGL, which is that multi-view Pbuffers don't exist. That leaves us with the quandary of how to emulate these GLX features:

  1. Double buffering. The lack of a multi-view Pbuffer EGL extension would require that we:
    1. emulate Pbuffers using FBOs, since double-buffered Pbuffers wouldn't exist. Currently VirtualGL emulates OpenGL windows using Pbuffers, but in the new implementation, it would have to emulate Pbuffers as well. We could probably still create a 1x1 dummy Pbuffer for each OpenGL window, which would at least allow us to maintain the 1:1 relationship between Drawable handles on the 2D X server and GLXDrawable handles on the 3D X server (or EGLSurfaces), but the actual structure of the emulated Pbuffer would be implemented with a "Drawable FBO" (and appropriate RBO attachments to emulate the back, stencil, and depth buffers.) This is problematic, since we'd be attempting to map a lower-level OpenGL feature to a higher-level GLX feature. The requirements would include, but would probably not be limited to:
      1. Interposing glReadBuffer(), glDrawBuffer(), glDrawBuffers(), glNamedFramebufferReadBuffer(), and glNamedFramebufferDrawBuffer() (VGL already interposes glDrawBuffer()) and redirecting GL_FRONT, GL_BACK, GL_FRONT_AND_BACK, etc. to the appropriate GL_COLOR_ATTACHMENTx target (in the case of GL_FRONT_AND_BACK, this would require calling down to glDrawBuffers().) Fortunately it appears as if it is an error to call glDrawBuffer() or glReadBuffer() with a target of GL_BACK/GL_FRONT/etc. whenever an FBO other than 0 is bound, so VirtualGL can similarly trigger an OpenGL error if those targets are used without the Drawable FBO being bound.
      2. Interposing glBindFramebuffer() in order to redirect Buffer 0 to the Drawable FBO.
      3. Interposing glGet*() in order to return values for GL_DOUBLEBUFFER, GL_DRAW_BUFFER, GL_DRAW_BUFFERi, GL_DRAW_FRAMEBUFFER_BINDING, GL_READ_FRAMEBUFFER_BINDING, GL_READ_BUFFER, and GL_RENDERBUFFER_BINDING that make sense from the application's point of view.
    2. emulate GLXFBConfigs somehow, since the GLXFBConfig or EGLConfig of the emulated Pbuffer would not represent its visual properties necessarily. This would likely require that VGL maintain a central table of internal FB configs; perform its own sorting algorithms within the body of glXChooseVisual(), glXChooseFBConfig(), and similar functions; and return its own internal structure pointers to the application when the application requests a GLXFBConfig. This is feasible, but it's difficult and fraught with potential compatibility issues.
    3. emulate GLX_PRESERVED_CONTENTS (Hopefully we don't need to? Otherwise, I have no clue), GLX_MAX_PBUFFER_WIDTH and GLX_MAX_PBUFFER_HEIGHT (could map to GL_MAX_FRAMEBUFFER_WIDTH and GL_MAX_FRAMEBUFFER_HEIGHT), and GLX_LARGEST_PBUFFER.
  2. Quad-buffered stereo. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition, Otherwise, I don't mind relegating this feature to the GLX back end only. I'm trying to figure out the industry direction on stereographic 3D rendering in general, because at the moment, it doesn't even appear possible to use quad-buffered stereo in OpenGL without using GLX. Furthermore, the only VGL configuration that supports quad-buffered stereo is the VGL Transport with a Linux client that has stereo capabilities. That configuration is useful for accessing visualization supercomputers remotely across a LAN, so it's definitely something I want to continue supporting, but it doesn't necessarily need to be supported with X-server-less GPU access.
  3. Aux. buffers. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition. Otherwise, this feature won't be available with X-server-less GPU access (aux. buffers were obsoleted in OpenGL 3.1 anyhow.)
  4. Accumulation buffers. These can't be emulated with FBOs, so if we have to use FBOs to emulate Pbuffers, then support for accumulation buffers will simply not exist in VirtualGL anymore. Accumulation buffers were also obsoleted in OpenGL 3.1, but why do I have a sinking feeling that there are still some commercial applications out there that use them? I guess such applications would have to be stuck on VGL 2.5.x if the use of FBOs proves necessary.
  5. Floating point pixels and other esoteric Pbuffer configurations that the nVidia drivers support. No idea even where to begin emulating such things using FBOs.
  6. Texture-from-pixmap. There appears to be an EGL extension for this, but no idea whether it supports desktop OpenGL or just OpenGL ES.
  7. Buffer swapping. If we have to emulate Pbuffers using FBOs, hopefully we can get away with simply swapping the color attachments.

Features that will likely have to be relegated to the legacy GLX back end only:

  • glXSelectEvent(). If we have to use FBOs to emulate Pbuffers, then I'm not sure how to emulate this at all. (Bueller? Bueller?)
  • GLX_EXT_import_context and indirect contexts in general. EGL has no concept of indirect contexts.
  • GLX_NV_swap_group. If we have to use FBOs to emulate Pbuffers, then this extension may not be possible to emulate at all.

As you can see, this is already a potential compatibility minefield. It at least becomes a manageable minefield if we are able to retain the existing GLX Pbuffer back end and simply add an EGL Pbuffer back end to it (i.e. if a multi-view EGL Pbuffer extension is available.) That would leave open the possibility of reverting to the GLX Pbuffer back end if certain applications don't work with the EGL Pbuffer back end. However, since I can think of no sane way to use FBOs for the EGL back end without also using them for the GLX back end, if we're forced to use FBOs, essentially everything we currently know about VirtualGL's compatibility with commercial applications would have to be thrown out the window. Emulating Pbuffers with FBOs is so potentially disruptive to application compatibility that I would even entertain the notion of introducing a new interposer library just for the EGL back end, and retaining the existing interposers until the new back end can be shown to be as compatible (these new interposers could be selected in vglrun based on the value of VGL_DISPLAY.)

Maybe I'm being too paranoid, but in the 13 years I've been maintaining this project, I've literally seen every inadvisable thing that an application can possibly do with OpenGL or GLX. A lot of commercial OpenGL ISVs seem to have the philosophy that, as long as their application works on the specific platforms they support, it doesn't matter if the code is brittle, non-future-proof, or if it only works by accident because the display is local and the GPU is fast. Hence my general desire to not introduce potential compatibility problems into VirtualGL. The more we try to interpose the OpenGL API, the more problems we will potentially encounter, since that API changes a lot more frequently than GLX. There is unfortunately no inexpensive way to test a GLX/OpenGL implementation for conformance problems (accessing the Khronos comformance suites requires a $30,000 fee), and whereas some of the companies reselling VirtualGL in their own products have access to a variety of commercial applications for testing, I have no such access personally.

@dcommander
Copy link
Member Author

Relabeling as "funding needed", since there is no way to pay for this project with the General Fund unless a multi-view Pbuffer extension for EGL materializes.

@nimbixler
Copy link

I'm thinking about funding this specific project. How do I do that? I'm happy to discuss offline, including the specifics around amount needed, etc. No corporate agenda other than interest in this feature and willingness to fund it (the OpenGL offload without X server).
Thanks!
Leo Reiter
CTO, Nimbix, Inc.

@dcommander
Copy link
Member Author

@nimbixler please contact me offline: https://virtualgl.org/About/Contact. At the moment, it doesn't appear that nVidia is going to be able to come up with a multibuffer EGL extension, so this project is definitely doable but is likely to be costly. However, I really do think it's going to be necessary in order to move VGL forward, and this year would be a perfect time to do it.

@dcommander dcommander removed this from the 2.6 milestone Apr 5, 2018
@dcommander
Copy link
Member Author

Pushed to a later release of VirtualGL, since 2.6 beta will land this month and there is no funding currently secured for this project.

@dcommander
Copy link
Member Author

Re-tagging as "funding needed." I've completed the groundwork (Phase 1), which is now in the dev branch (with relevant changes that affect the stable branch placed in master.) However, due to budgetary constraints with the primary company that is sponsoring this, it appears that I'm going to need to split cost on the project across multiple companies in order to make it land in 2019.


Phase 1

  • Cleaning up some issues in the 2.6.x stable code in order to provide a solid baseline against which to test for regressions (using servertest, which invokes frameut and fakerut with various permutations of VirtualGL settings)
    • Overhauling the autotest mechanism that fakerut uses to communicate with the faker. The faker was previously sending back autotest information to fakerut using the environment, but that is not thread-safe, and it was causing sporadic crashes in fakerut's multithreaded test. The faker now exposes special functions that fakerut can load via dlsym() to obtain the autotest data, and the faker now stores that data internally using thread-local variables.
    • Fixing minor issues flagged by valgrind in the faker and unit tests
    • Fixing a deadlock in frameut that sometimes occurred when a particular test completed
    • Making the visual-to-FB config matching tests in fakerut generally more robust across different OpenGL stacks (I am personally able to test against the nVidia proprietary driver, the old fglrx/Catalyst AMD proprietary driver, and the VMWare open source driver.)
    • Fixing issues that caused fakerut to fail with the fglrx driver (basically legitimate oversights in the faker and fakerut code that the nVidia driver allowed but the fglrx driver didn't)
    • Adding an option to fakerut to work around an issue whereby the fglrx driver creates all Pixmaps as single-buffered despite claiming that double-buffered Pixmap-friendly FB configs are available
    • Fixing minor issues that prevented fakerut and other unit tests from completing successfully when run on a 2D X server screen other than 0
  • Eliminating DisplayHash and VisualHash and attaching, using the XExtData mechanism,
    • the display's excluded status to the Display structure for a given 2D X server connection
    • the visual's most recent matching FB config to the Visual structure for a given 2D X server connection and visual
    • the visual attribute table (from glxvisual.cpp) to the Screen structure for a given 2D X server connection and screen
  • Removing support for transparent overlay visuals. Refer to the change log and Git log for the complete explanation, but in a nutshell, that feature had long since passed the point of obsolescence and uselessness, and it needed to go in order to make room for this feature. Removing transparent overlay visual support eliminated ReverseConfigHash.
  • Eliminating ConfigHash and creating a FB config attribute table similar to the aforementioned visual attribute table. This table maps GLXFBConfig ids on the 3D X server to matching 2D X server visuals. As with the visual attribute table, it is attached to the Screen structure for a given 2D X server connection and screen, and it is created on first use. This table will be necessary for implementing an FBO back end, since FBOs will necessitate breaking the current 1:1 relationship between GLXFBConfigs used internally by VirtualGL and GLXFBConfigs returned to the application. This also sped up the visual-to-FB config matching tests in fakerut

Phase 2

Implementing the EGL back end

  • Adding a new mode of operation to the faker that is triggered by specifying a DRM device (e.g. /dev/dri/card0) in VGL_DISPLAY rather than an X display, or by setting VGL_DISPLAY to egl
  • Modifying vglserver_config so that it can be used to configure only the EGL mode of operation, for those who would rather not use a 3D X server
    • I think (but am not sure) that the steps that vglserver_config already takes in order to modify the framebuffer device permissions will apply to EGL
  • Abstracting the implementation of multi-buffer visual features that EGL Pbuffers don't support-- namely double buffering and stereo-- by attaching FBOs to the EGL Pbuffers
    • Aux. and accumulation buffers (both of which are obsolete and which were also removed in OpenGL 3.1 ten years ago) will not be supported in EGL mode.
    • Hopefully most of this can be encapsulated within VirtualDrawable::OGLDrawable, but there will be some other touch points, including interposing some new OpenGL functions in order to prevent applications from clobbering our FBOs and to use the FBO attachments when applications want to render to the back or right buffers.
  • Extending the faker's loader/wrapper for "real" OpenGL/GLX functions to handle libEGL functions.
  • Abstracting, where necessary, the back-end calls that the faker makes to "real" GLX functions by emulating the same functionality using EGL.
  • Documenting the EGL mode of operation
  • Lots of testing

@dcommander
Copy link
Member Author

@nimbixler did you get my e-mail? We could use any funding help you can muster on this.

@VirtualGL VirtualGL deleted a comment from nimbixler Mar 2, 2019
@al3x609
Copy link

al3x609 commented Apr 14, 2019

This is amazing first step, openGL direct rendering without Xserver is a essential feature for HPC world. Let met explain me, Im working with virtualGL/turboVNC/noVNC for deploy a remote visualization service in a cluster HPC across a single node for remote viz, because the other nodes are used for compute mode with Cuda and another tools, What those mean?

If we need to run a Xorg instance for remote viz in a GPU process, this GPU can not be shared for compute mode and X windows system, (the user should be aware of certain limitations with handling both activities simultaneously on a single GPU. If no consideration is given to managing both sets of tasks simultaneously, the system may experience disturbances and hangs in the X Window system, leading to an interruption of processing X-related tasks, such as display updates and rendering.).

Then the hpc world need a separate cluster for HPC, running X 3D server on every node for this service. This isn't good approach, the hardware requirements are very big. Share the same GPU with X windows system and GPGPU compute mode let fusion the both cluster in a single layer. Now the inSITU visualization need this approach for good performance and share resources over the cluster will minimize the costs.

the EGL remote hardware rendering and future webassembly service with h.264 coding are a good combination.

Im sorry for my poor english. :)

@dcommander
Copy link
Member Author

I have been looking at WebAssembly in the context of designing an in-browser TurboVNC viewer. So far, it seems to be not fully baked. I've gotten as far as building libjpeg-turbo (which requires disabling its SIMD extensions, since WASM doesn't support SIMD instructions yet) and LibVNCClient into WebAssembly code and running one of the LibVNCClient examples in a browser, but the WebAssembly sockets-to-WebSockets emulation layer doesn't work properly, and the program locks up the browser.

@al3x609
Copy link

al3x609 commented Apr 15, 2019

There is a github project that tray to resolve this issue, simd proposal based to SIMD.JS. I Think. 🤔

@dcommander
Copy link
Member Author

NOTE: VirtualBox also doesn't work with the EGL back end yet (3D applications running in the guest display a black window.) I do notice that there are some named framebuffer functions in OpenGL 4.6 that I will probably have to interpose at some point, since some of those functions are allowed to operate on the default framebuffer (in which case it will be necessary for VGL to redirect the operation to the default FBO.) However, apitrace doesn't reveal that any of the failing applications are using those functions.

@dcommander
Copy link
Member Author

All Qt5 demos are now working, as is VirtualBox, so the only remaining known issues are:

Closing this issue. Please feel free to add comments confirming that a particular application works with the EGL back end, but if something doesn't work, please open a new issue. Thanks.

@muratmaga
Copy link

@ffeldhaus /etc/modprobe.d/virtualgl.conf sets the permissions for /dev/nvidia* at boot time. I honestly don't know whether that's necessary for the EGL back end or not. Note that, unless you also pass +f to vglserver_config, the devices will be restricted to the vglusers group only, so that may be why nvidia-container-cli is complaining.

I am suffering from the "nvidia-container-cli: initialization error: nvml error: insufficient permissions", so is the current suggested solution is not to restrict the permission to vglusers group?

@dcommander
Copy link
Member Author

No, it should still work if the permissions are restricted, as long as your user account is a member of vglusers and you logged out and back in to allow the group membership to take effect.

@muratmaga
Copy link

No, didn't work for me even though I was a member of the vglusers and can run everything vgl related no problem. This is an access-restricted system, so removing the permissions is not a deal breaker for me. I also should note that this is with the stable 2.6.3, not with development branch.

@dcommander
Copy link
Member Author

@muratmaga It works fine for me. Also, this isn't the right place to discuss any issues that are unrelated to the EGL back end. This thread is closed, so any further problems should be filed as a new issue.

# ls -l /dev/nvidia*
crw-rw----. 1 root vglusers 195,   0 Sep 28 15:27 /dev/nvidia0
crw-rw----. 1 root vglusers 195, 255 Sep 28 15:27 /dev/nvidiactl
crw-rw----. 1 root vglusers 195, 254 Sep 28 15:28 /dev/nvidia-modeset
# nvidia-smi
Mon Sep 28 15:28:42 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K5000        Off  | 00000000:03:00.0  On |                  Off |
| 30%   34C    P8    14W / 137W |    116MiB /  4036MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1259      G   /usr/bin/X                        113MiB |
+-----------------------------------------------------------------------------+

@ehfd
Copy link

ehfd commented Nov 13, 2020

https://github.com/ehfd/docker-nvidia-egl-desktop

MATE Desktop container for NVIDIA GPUs without using an X Server, directly accessing the GPU with EGL to emulate GLX using VirtualGL and TurboVNC. Does not require /tmp/X11-unix host sockets.

@dcommander
Copy link
Member Author

@ehfd Yes, I know. Please refer to my examples here:
https://github.com/dcommander/virtualgl_docker_examples
The only reason why I pass /tmp/.X11-unix to the Docker container in one of the EGL examples is because it's sharing the host-side TurboVNC display.

@dcommander
Copy link
Member Author

The EGL back end should now mostly work with the AMDGPU and nouveau drivers. The only remaining failures with those drivers are in fakerut's multi-threaded tests. Since those same tests work fine with nVidia's EGL implementation, I'm not yet convinced that the failures with Mesa-based drivers are due to a bug in the EGL back end. They may simply be due to stricter behavior on the part of Mesa. When I have time, I'll investigate that issue more thoroughly.

@dcommander
Copy link
Member Author

Correction: 59093fa fixed the fakerut multithreaded test failures with Mesa-based drivers. nouveau seems to have thread safety issues in general, and those manifest with the GLX back end as well, but the EGL back end now fully works with AMDGPU and fully works with nouveau minus those thread safety issues, which are out of our control.

@pvmilk
Copy link

pvmilk commented May 31, 2021

Do you think this functionality would work with embedded system like nvidia jetson nano?

I have been using VirtualGL in the device through X server, but having it with EGL would be nice.
Apparently, it might use different interface than normal GPUs (REF).

I don't know much about how the different, but can help find out in the jetson forums if I knew what is required to make it works.

P.S. Thanks for a wonderful work on VirtualGL.

@dcommander
Copy link
Member Author

Do you think this functionality would work with embedded system like nvidia jetson nano?

No idea. I have no experience with that hardware. But the fundamental purpose of VirtualGL is to remotely display 3D applications from high-spec servers in the machine room or the cloud to low-spec clients, not the other way around.

@peci1
Copy link

peci1 commented May 31, 2021

I've just happened to have a Jetson running, so wait for a while and I'll tell you where I got. Apparently, there are no arm64 binary builds, so I'm building libjpeg-turbo and once I'll have it, I'll try virtualgl.

@peci1
Copy link

peci1 commented May 31, 2021

It works (at least for simple stuff like glxgears):

$ xvfb-run -a vglrun +v -d /dev/dri/card0 /usr/bin/glxgears -info
[VGL] Shared memory segment ID for vglconfig: 1376260
[VGL] VirtualGL v2.6.80 64-bit (Build 20210531)
[VGL] Opening EGL device /dev/dri/card0
libGL error: MESA-LOADER: failed to open swrast (search paths /usr/lib/aarch64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri)
libGL error: failed to load driver: swrast
[VGL] WARNING: Could not set WM_DELETE_WINDOW on window 0x00200002
GL_RENDERER   = NVIDIA Tegra Xavier (nvgpu)/integrated
GL_VERSION    = 4.6.0 NVIDIA 32.4.4
GL_VENDOR     = NVIDIA Corporation
GL_EXTENSIONS = GL_AMD_multi_draw_indirect GL_AMD_seamless_cubemap_per_texture GL_AMD_vertex_shader_viewport_index GL_AMD_vertex_shader_layer GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compatibility GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_compute_shader GL_ARB_compute_variable_group_size GL_ARB_conditional_render_inverted GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_draw_elements_base_vertex GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_ES2_compatibility GL_ARB_ES3_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_2_compatibility GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_fragment_shader_interlock GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gl_spirv GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader_int64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_parallel_shader_compile GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_polygon_offset_clamp GL_ARB_post_depth_coverage GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_robustness GL_ARB_sample_locations GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counter_ops GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_clock GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shader_viewport_layer_array GL_ARB_shading_language_420pack GL_ARB_shading_language_include GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_sparse_buffer GL_ARB_sparse_texture GL_ARB_sparse_texture2 GL_ARB_sparse_texture_clamp GL_ARB_spirv_extensions GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_filter_anisotropic GL_ARB_texture_filter_minmax GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_EGL_image_storage GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_polygon_offset_clamp GL_EXT_post_depth_coverage GL_EXT_provoking_vertex GL_EXT_raster_multisample GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_shader_objects GL_EXT_separate_specular_color GL_EXT_shader_image_load_formatted GL_EXT_shader_image_load_store GL_EXT_shader_integer_mix GL_EXT_shadow_funcs GL_EXT_sparse_texture2 GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_filter_minmax GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_shared_exponent GL_EXT_texture_sRGB GL_EXT_texture_sRGB_R8 GL_EXT_texture_sRGB_decode GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback2 GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_window_rectangles GL_EXT_import_sync_object GL_NV_robustness_video_memory_purge GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KHR_context_flush_control GL_KHR_debug GL_EXT_memory_object GL_EXT_memory_object_fd GL_KHR_parallel_shader_compile GL_KHR_no_error GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_EXT_semaphore GL_EXT_semaphore_fd GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_sliced_3d GL_KTX_buffer_region GL_NV_alpha_to_coverage_dither_control GL_NV_bindless_multi_draw_indirect GL_NV_bindless_multi_draw_indirect_count GL_NV_bindless_texture GL_NV_blend_equation_advanced GL_NV_blend_equation_advanced_coherent GL_NVX_blend_equation_advanced_multi_draw_buffers GL_NV_blend_minmax_factor GL_NV_blend_square GL_NV_clip_space_w_scaling GL_NV_command_list GL_NV_compute_program5 GL_NV_conditional_render GL_NV_conservative_raster GL_NV_conservative_raster_dilate GL_NV_conservative_raster_pre_snap GL_NV_conservative_raster_pre_snap_triangles GL_NV_conservative_raster_underestimation GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_draw_texture GL_NV_draw_vulkan_image GL_NV_ES1_1_compatibility GL_NV_ES3_1_compatibility GL_NV_explicit_multisample GL_NV_feature_query GL_NV_fence GL_NV_fill_rectangle GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragment_coverage_to_color GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_fragment_shader_interlock GL_NV_framebuffer_mixed_samples GL_NV_framebuffer_multisample_coverage GL_NV_geometry_shader4 GL_NV_geometry_shader_passthrough GL_NV_gpu_program4 GL_NV_internalformat_sample_query GL_NV_gpu_program4_1 GL_NV_gpu_program5 GL_NV_gpu_program5_mem_extended GL_NV_gpu_program_fp64 GL_NV_gpu_shader5 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_coverage GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_parameter_buffer_object2 GL_NV_path_rendering GL_NV_path_rendering_shared_edge GL_NV_point_sprite GL_NV_primitive_restart GL_NV_query_resource GL_NV_query_resource_tag GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_sample_locations GL_NV_sample_mask_override_coverage GL_NV_shader_atomic_counters GL_NV_shader_atomic_float GL_NV_shader_atomic_float64 GL_NV_shader_atomic_fp16_vector GL_NV_shader_atomic_int64 GL_NV_shader_buffer_load GL_NV_shader_storage_buffer_object GL_NV_stereo_view_rendering GL_NV_texgen_reflection GL_NV_texture_barrier GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_multisample GL_NV_texture_rectangle GL_NV_texture_rectangle_compressed GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_transform_feedback2 GL_NV_uniform_buffer_unified_memory GL_NV_vertex_attrib_integer_64bit GL_NV_vertex_buffer_unified_memory GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NV_viewport_array2 GL_NV_viewport_swizzle GL_NVX_conditional_render GL_NV_shader_thread_group GL_NV_shader_thread_shuffle GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_OVR_multiview GL_OVR_multiview2 GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum
VisualID 65, 0x41
[VGL] Using pixel buffer objects for readback (BGR --> BGRA)
2451 frames in 5.0 seconds = 490.180 FPS
2766 frames in 5.0 seconds = 553.133 FPS

Here are debs built for L4T 32.4:
virtualgl-arm64.zip

I tested it on Xavier NX, but it should be very similar to Nano.

To build the packages, no special configuration was needed.

@peci1
Copy link

peci1 commented May 31, 2021

The /dev/dri/card0 device is not there by default. You have to call

sudo modprobe tegra-udrm modeset=1

to get it. This should be configurable permanently from /etc/modprobe.d.

@pvmilk
Copy link

pvmilk commented Jun 1, 2021

Wow, thanks a lot for testing it, the deb, and the modprobe trick.

I tested it on the jetson nano (jetson-nano-jp451-sd-card-image) and it works!!
The performance is much better at least both on the glxgears and glxspheres64.

glxgears

[through X]: $ vglrun +v glxgears
199 frames in 5.0 seconds = 39.663 FPS
190 frames in 5.0 seconds = 37.943 FPS

[through EGL]: $ vglrun +v -d /dev/dri/card0 glxgears
2543 frames in 5.0 seconds = 508.404 FPS
2558 frames in 5.0 seconds = 511.551 FPS

glxspheres64

[through X]: $ vglrun +v /opt/VirtualGL/bin/glxspheres64
26.305231 frames/sec - 29.356637 Mpixels/sec
25.921894 frames/sec - 28.928834 Mpixels/sec

[through EGL]: $ vglrun +v -d /dev/dri/card0 /opt/VirtualGL/bin/glxspheres64
52.206648 frames/sec - 58.262619 Mpixels/sec
61.089091 frames/sec - 68.175426 Mpixels/sec

Btw, which branch or commit is the build?
I tried it on my deb which build on the current master (6ce0bf1), it didn't work.
[Update: nevermind, this commit, 552ef96, in branch dev works fine]

@peci1
Copy link

peci1 commented Jun 1, 2021

It's the tip of dev branch, as EGL only works there (not merged into master yet).

@duongnb09
Copy link

It works (at least for simple stuff like glxgears):

$ xvfb-run -a vglrun +v -d /dev/dri/card0 /usr/bin/glxgears -info
[VGL] Shared memory segment ID for vglconfig: 1376260
[VGL] VirtualGL v2.6.80 64-bit (Build 20210531)
[VGL] Opening EGL device /dev/dri/card0
libGL error: MESA-LOADER: failed to open swrast (search paths /usr/lib/aarch64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri)
libGL error: failed to load driver: swrast
[VGL] WARNING: Could not set WM_DELETE_WINDOW on window 0x00200002
GL_RENDERER   = NVIDIA Tegra Xavier (nvgpu)/integrated
GL_VERSION    = 4.6.0 NVIDIA 32.4.4
GL_VENDOR     = NVIDIA Corporation
GL_EXTENSIONS = GL_AMD_multi_draw_indirect GL_AMD_seamless_cubemap_per_texture GL_AMD_vertex_shader_viewport_index GL_AMD_vertex_shader_layer GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compatibility GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_compute_shader GL_ARB_compute_variable_group_size GL_ARB_conditional_render_inverted GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_draw_elements_base_vertex GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_ES2_compatibility GL_ARB_ES3_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_2_compatibility GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_fragment_shader_interlock GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gl_spirv GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader_int64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_parallel_shader_compile GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_polygon_offset_clamp GL_ARB_post_depth_coverage GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_robustness GL_ARB_sample_locations GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counter_ops GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_clock GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shader_viewport_layer_array GL_ARB_shading_language_420pack GL_ARB_shading_language_include GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_sparse_buffer GL_ARB_sparse_texture GL_ARB_sparse_texture2 GL_ARB_sparse_texture_clamp GL_ARB_spirv_extensions GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_filter_anisotropic GL_ARB_texture_filter_minmax GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_EGL_image_storage GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_polygon_offset_clamp GL_EXT_post_depth_coverage GL_EXT_provoking_vertex GL_EXT_raster_multisample GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_shader_objects GL_EXT_separate_specular_color GL_EXT_shader_image_load_formatted GL_EXT_shader_image_load_store GL_EXT_shader_integer_mix GL_EXT_shadow_funcs GL_EXT_sparse_texture2 GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_filter_minmax GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_shared_exponent GL_EXT_texture_sRGB GL_EXT_texture_sRGB_R8 GL_EXT_texture_sRGB_decode GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback2 GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_window_rectangles GL_EXT_import_sync_object GL_NV_robustness_video_memory_purge GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KHR_context_flush_control GL_KHR_debug GL_EXT_memory_object GL_EXT_memory_object_fd GL_KHR_parallel_shader_compile GL_KHR_no_error GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_EXT_semaphore GL_EXT_semaphore_fd GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_sliced_3d GL_KTX_buffer_region GL_NV_alpha_to_coverage_dither_control GL_NV_bindless_multi_draw_indirect GL_NV_bindless_multi_draw_indirect_count GL_NV_bindless_texture GL_NV_blend_equation_advanced GL_NV_blend_equation_advanced_coherent GL_NVX_blend_equation_advanced_multi_draw_buffers GL_NV_blend_minmax_factor GL_NV_blend_square GL_NV_clip_space_w_scaling GL_NV_command_list GL_NV_compute_program5 GL_NV_conditional_render GL_NV_conservative_raster GL_NV_conservative_raster_dilate GL_NV_conservative_raster_pre_snap GL_NV_conservative_raster_pre_snap_triangles GL_NV_conservative_raster_underestimation GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_draw_texture GL_NV_draw_vulkan_image GL_NV_ES1_1_compatibility GL_NV_ES3_1_compatibility GL_NV_explicit_multisample GL_NV_feature_query GL_NV_fence GL_NV_fill_rectangle GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragment_coverage_to_color GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_fragment_shader_interlock GL_NV_framebuffer_mixed_samples GL_NV_framebuffer_multisample_coverage GL_NV_geometry_shader4 GL_NV_geometry_shader_passthrough GL_NV_gpu_program4 GL_NV_internalformat_sample_query GL_NV_gpu_program4_1 GL_NV_gpu_program5 GL_NV_gpu_program5_mem_extended GL_NV_gpu_program_fp64 GL_NV_gpu_shader5 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_coverage GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_parameter_buffer_object2 GL_NV_path_rendering GL_NV_path_rendering_shared_edge GL_NV_point_sprite GL_NV_primitive_restart GL_NV_query_resource GL_NV_query_resource_tag GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_sample_locations GL_NV_sample_mask_override_coverage GL_NV_shader_atomic_counters GL_NV_shader_atomic_float GL_NV_shader_atomic_float64 GL_NV_shader_atomic_fp16_vector GL_NV_shader_atomic_int64 GL_NV_shader_buffer_load GL_NV_shader_storage_buffer_object GL_NV_stereo_view_rendering GL_NV_texgen_reflection GL_NV_texture_barrier GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_multisample GL_NV_texture_rectangle GL_NV_texture_rectangle_compressed GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_transform_feedback2 GL_NV_uniform_buffer_unified_memory GL_NV_vertex_attrib_integer_64bit GL_NV_vertex_buffer_unified_memory GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NV_viewport_array2 GL_NV_viewport_swizzle GL_NVX_conditional_render GL_NV_shader_thread_group GL_NV_shader_thread_shuffle GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_OVR_multiview GL_OVR_multiview2 GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum
VisualID 65, 0x41
[VGL] Using pixel buffer objects for readback (BGR --> BGRA)
2451 frames in 5.0 seconds = 490.180 FPS
2766 frames in 5.0 seconds = 553.133 FPS

Here are debs built for L4T 32.4: virtualgl-arm64.zip

I tested it on Xavier NX, but it should be very similar to Nano.

To build the packages, no special configuration was needed.

@peci1 Hi, do you know why we need to use xvfb-run here? Without xvfb, I kept getting "Could not open display" error message. I thought VirtualGL can run with EGL without any open display.

@dcommander
Copy link
Member Author

@duongnb09 EGL eliminates the need for a 3D X server. You still need a 2D X server. In the example above, Xvfb is used as a headless 2D X server solely for testing purposes. Normally the 2D X server would be an X proxy, such as TurboVNC, that transmits its display to the client. (The 2D X server could also be on the client, if you are using the VGL Transport rather than an X proxy.) Refer to the User's Guide for definitions of all of the terms I just used.

@duongnb09
Copy link

@duongnb09 EGL eliminates the need for a 3D X server. You still need a 2D X server. In the example above, Xvfb is used as a headless 2D X server solely for testing purposes. Normally the 2D X server would be an X proxy, such as TurboVNC, that transmits its display to the client. (The 2D X server could also be on the client, if you are using the VGL Transport rather than an X proxy.) Refer to the User's Guide for definitions of all of the terms I just used.

@dcommander Thanks for the explanation! I have a follow-up question about this. I want to use VirtualGL in the headless mode with EGL. Can we just setup TurboVNC server to create a display and then use VirtualGL with vglrun -d /dev/dri/card0? How to force VirtualGL to use the display from TurboVNC?

@dcommander
Copy link
Member Author

@duongnb09 Yes, you can use the TurboVNC Server as a 2D X server. That is, in fact, the recommended combination. (TurboVNC and VirtualGL were designed as companion products.) VirtualGL will automatically use the display specified in DISPLAY as the 2D X server, so vglrun -d /dev/dri/card0 will work automatically in a TurboVNC session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests