WIP: add an OpenCL ICD loader extension to shut down and free memory #224

bashbaug · 2023-08-21T15:29:36Z

This is a proposed solution to address #157. It adds an ICD loader extension that can be explicitly called by the host application (or possibly a layer?) to shut down and free memory.

I'm calling this WIP for now as I would like feedback in a few areas:

How thorough do we want the shutdown to be? Specifically, do we want to unload the ICDs by closing the library handles, or do we only want to free ICD loader memory? Do we want options to do both?
Do we want to add any sort of "stub layer" that causes subsequent OpenCL API calls to return an error code as part of the shutdown? This seems to be working in the current POC, though note it will only be possible to do this for core OpenCL APIs and it will not be possible to stub OpenCL extension APIs.
How thread safe does this need to be? Specifically, do we need to handle the case where the ICD loader shuts down while other OpenCL calls are being made from other threads?
Should we also provide a mechanism to "re-initialize" the ICD loader after shutting down or do we only need to support a single shutdown?
How do we want to test this?

I'll write up an "extension spec" for this when it's a bit further along but I'd like to work through some of these details first. Thanks!

Kerilk · 2023-08-21T18:03:42Z

Thanks for implementing this @bashbaug, and for raising the very good questions.
I think maybe a good way to look at it is through usage scenarios. I see two uses for freeing loader resources:

Application linked against libOpenCL: leaks are reported by a tool (valgrind or similar). Though these may be better served by suppression files (as leaks on exit are of no real consequences), this approach would work. Since the dynamic loader related leaks are already suppressed one way or another, I don't think trying to unload vendor libraries is useful here. There is always the risk a library would still be using OpenCL symbols, so the layer trick is good, though it doesn't guaranty anything as you rightly noticed. Trying to prevent race conditions would require a big lock around each OpenCL call, so I don't think it is feasible (especially with extensions). Re-initializing would serve no purpose in this scenario.
Application that dynamically load libOpenCL.so or a library linked against it several time throughout the life of the applications, and potentially in parallel. In this scenario we would have a real memory leaks (every time the library is unloaded), and it couldn't be addressed by the above extension since we have no way (as far as I know) of reference counting the library. In this case I think we would be better served by https://github.com/KhronosGroup/OpenCL-ICD-Loader/pull/217/files but in addition vendor libraries should be unloaded. This is what we were worried about implementing previously due to potential race conditions. But I see that the Vulkan loader does indeed implement this here https://github.com/KhronosGroup/Vulkan-Loader/blob/dd8332d253cfaf3a9be306af4194e4dffc2e3b3c/loader/loader_windows.c#L98-L114 and there https://github.com/KhronosGroup/Vulkan-Loader/blob/dd8332d253cfaf3a9be306af4194e4dffc2e3b3c/loader/loader.c#L1897-L1901, and it unloads vendor libraries as well.

I am increasingly convinced we should address this problem properly through a proper library destructor. This may require vendors to update their implementation if unloading them is an issues, and some libraries could break if they have a bad de-initialization order and would need to be updated. We may also need to add an explicit destruction API to the layers. The fact that Vulkan is successfully implementing this strategy is a good indicator we should be able to as well. And it seems (to me) to be the only long term strategy that would reliably fix the issue.

Nonetheless, I don't think that implementing this extension today would be problematic down the road, since we can always make it a noop if we decide to implement the library destructor strategy.

bashbaug · 2023-08-22T00:24:53Z

I'm not comfortable saying it's the right thing to do in all cases, but there are some interesting things that fall out if we only free memory and do not close library handles or otherwise unload the ICDs themselves. Referring back to my original list:

We don't need to install a stub layer because all OpenCL handles remain valid and it should continue to be safe to make OpenCL calls using these handles. The main thing that changes after shutdown is that it would no longer be possible to enumerate platforms via clGetPlatformIDs.
I think we would only need a handful of locks to be thread safe. Basically, we would only need locks in the functions that create or destroy the vendor list. All other OpenCL calls could and should remain lock-free.
We could add a function to re-initialize the ICD loader if desired, though this would need a lock since it would be re-creating the vendor list.
Testing would be challenging, but maybe not too bad if we had a re-initialization function?

Based on this analysis I think I've also convinced myself doing the shutdown as part of a library destructor should be safe too, as long as it doesn't close the library handles and unload the ICDs themselves. Does this make sense?

Kerilk · 2023-08-22T15:48:36Z

I see a problem with 2 & 3 in a scenario with layers. When freeing the layer memory, the dispatch tables that were passed to the layers are freed, so potentially corrupted afterwards see:

OpenCL-ICD-Loader/loader/icd.h

Lines 109 to 123 in 229410f

    
           struct KHRLayer 
        
           { 
        
               // the loaded library object (true type varies on Linux versus Windows) 
        
               void *library; 
        
               // the dispatch table of the layer 
        
               struct _cl_icd_dispatch dispatch; 
        
               // The next layer in the chain 
        
               struct KHRLayer *next; 
        
           #ifdef CL_LAYER_INFO 
        
               // The layer library name 
        
               char *libraryName; 
        
               // the pointer to the clGetLayerInfo funciton 
        
               void *p_clGetLayerInfo; 
        
           #endif 
        
           };

OpenCL-ICD-Loader/test/layer/icd_print_layer.c

Line 26 in 229410f

const struct _cl_icd_dispatch *tdispatch;

OpenCL-ICD-Loader/test/layer/icd_print_layer.c

Line 85 in 229410f

tdispatch = target_dispatch;

Thus, any command already in flight may fail. This is why, in order to be thread safe you would need to be sure no new calls enter layers, and wait for all calls to be out of the layers, before de-allocating them and preventing any new entry. A read/write lock would work but I don't think we want to add that to every call.

Arguably, we could require layers to make a copy of the table, but that would be an update to the layer API.

Also, layers may have created/wrapped objects and these may not be dispatch-able through the loader.

I agree that we should be able to start by not releasing vendor library handle, and release everything as part of the library destructor. In a well behaved environment, no call should be made to a library once it's destructor is called since any program or library using the library should hold a reference to it. Also the destruction function but also the re-initialization function become unneeded in this scenario, as the library would be reinitialized if it becomes reloaded.

WRT the current implementation proposal, I would reverse the order of de-initialization, releasing the layers first and then the vendor list to mirror the order of initialization. I would also make sure the release order of each resource (layer or vendor list entry) is done in reverse order of their initialization. This will be helpful if when we add a de-initialization method to layers (and the destructor would be a perfect place to call it) and if at some point we decide to release vendor library handles.

bashbaug · 2023-08-22T20:02:54Z

Thus, any command already in flight may fail.

Ah, crud, good point. Yes, calls inside layers would have a problem if we're freeing dispatch tables out from under them.

WRT the current implementation proposal, I would reverse the order of de-initialization, releasing the layers first and then the vendor list to mirror the order of initialization. I would also make sure the release order of each resource (layer or vendor list entry) is done in reverse order of their initialization.

Sure, I can do this. To be clear, to release in reverse order I would:

Free layers first, from the front of the list to the back.
Free vendors second, from the back of the list to the front.

Kerilk · 2023-08-22T20:05:58Z

Sure, I can do this. To be clear, to release in reverse order I would:
1. Free layers first, from the front of the list to the back.

2. Free vendors second, from the back of the list to the front.

This is my understanding as well.

Kerilk · 2023-08-23T18:55:27Z

@bashbaug I though a bit more about the layer issues. Even if we do not release the layer library handle, there is no guaranty a layer could be reinitialized without issues (leaks or incorrect state). I prepared a PR to address this issue: KhronosGroup/OpenCL-Docs#962. This would allow a layer to be de-initialized and then later reinitialized (we should also be able to release the library handle in this case). We may also leverage this functionality for testing the proper behavior of the initialization de-initialization mechanism of the loader itself:

I envision two test cases corresponding to the two usages described above:

Application linked against libOpenCL.so
Application that uses lisOpencl.so in a loop through dlopen, dlsym, and dlclose

One or more test layers could validate the good behavior and their output could be used as a source of truth. The simplest one would be a layer that counts the OpenCL calls and outputs this count at termination.

When you feel the time is right, and if you want, I can make a PR to your branch implementing the termination call, tests, and layers.

bashbaug · 2023-08-29T14:29:35Z

I switched the shutdown order so the objects are freed in the reverse order they are initialized.

Do we think the stub layer is valuable (item (2) in my original list)? If it isn't, I can take it our of this PR, and it will become significantly smaller.

Kerilk · 2023-08-29T15:00:05Z

I think it depends where you want to take it. If you want to enable the extension function, as in your original plan, then it may alleviate some failures, but so would just removing it. If we go with the library destructor, no call to the library should be made once the destructor is entered, so having the layer could allow diagnostics (especially is it was full of assert(0)) and identify problematic behavior, but I don't think it should be included for releases.

So I think I still see value in having the stub layer.

bashbaug added 7 commits August 19, 2023 10:45

add a cmake target for ICD loader code generation

5859d59

improve template whitespace and comments

a0990e0

add a shutdown dispatch table that only returns errors

c2bfd0d

switch to a more inclusive name for the actual dispatch table

fcfa89e

fix some bugs in the shutdown dispatch table

af17b6d

basic functionality appears to be working

e6670a6

fix unused variable warnings

debd85d

free the vendor suffix also

a74c4e5

Kerilk mentioned this pull request Aug 23, 2023

Add new version of layer API for deinitialization. KhronosGroup/OpenCL-Docs#962

Open

bashbaug added 2 commits August 28, 2023 17:39

clean up layers first

9e61e24

free vendors from back to front

d36b2e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add an OpenCL ICD loader extension to shut down and free memory #224

WIP: add an OpenCL ICD loader extension to shut down and free memory #224

bashbaug commented Aug 21, 2023

Kerilk commented Aug 21, 2023 •

edited

Loading

bashbaug commented Aug 22, 2023

Kerilk commented Aug 22, 2023

bashbaug commented Aug 22, 2023

Kerilk commented Aug 22, 2023

Kerilk commented Aug 23, 2023 •

edited

Loading

bashbaug commented Aug 29, 2023

Kerilk commented Aug 29, 2023

WIP: add an OpenCL ICD loader extension to shut down and free memory #224

Are you sure you want to change the base?

WIP: add an OpenCL ICD loader extension to shut down and free memory #224

Conversation

bashbaug commented Aug 21, 2023

Kerilk commented Aug 21, 2023 • edited Loading

bashbaug commented Aug 22, 2023

Kerilk commented Aug 22, 2023

bashbaug commented Aug 22, 2023

Kerilk commented Aug 22, 2023

Kerilk commented Aug 23, 2023 • edited Loading

bashbaug commented Aug 29, 2023

Kerilk commented Aug 29, 2023

Kerilk commented Aug 21, 2023 •

edited

Loading

Kerilk commented Aug 23, 2023 •

edited

Loading