Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add an OpenCL ICD loader extension to shut down and free memory #224

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

bashbaug
Copy link
Contributor

This is a proposed solution to address #157. It adds an ICD loader extension that can be explicitly called by the host application (or possibly a layer?) to shut down and free memory.

I'm calling this WIP for now as I would like feedback in a few areas:

  1. How thorough do we want the shutdown to be? Specifically, do we want to unload the ICDs by closing the library handles, or do we only want to free ICD loader memory? Do we want options to do both?
  2. Do we want to add any sort of "stub layer" that causes subsequent OpenCL API calls to return an error code as part of the shutdown? This seems to be working in the current POC, though note it will only be possible to do this for core OpenCL APIs and it will not be possible to stub OpenCL extension APIs.
  3. How thread safe does this need to be? Specifically, do we need to handle the case where the ICD loader shuts down while other OpenCL calls are being made from other threads?
  4. Should we also provide a mechanism to "re-initialize" the ICD loader after shutting down or do we only need to support a single shutdown?
  5. How do we want to test this?

I'll write up an "extension spec" for this when it's a bit further along but I'd like to work through some of these details first. Thanks!

@Kerilk
Copy link
Contributor

Kerilk commented Aug 21, 2023

Thanks for implementing this @bashbaug, and for raising the very good questions.
I think maybe a good way to look at it is through usage scenarios. I see two uses for freeing loader resources:

  • Application linked against libOpenCL: leaks are reported by a tool (valgrind or similar). Though these may be better served by suppression files (as leaks on exit are of no real consequences), this approach would work. Since the dynamic loader related leaks are already suppressed one way or another, I don't think trying to unload vendor libraries is useful here. There is always the risk a library would still be using OpenCL symbols, so the layer trick is good, though it doesn't guaranty anything as you rightly noticed. Trying to prevent race conditions would require a big lock around each OpenCL call, so I don't think it is feasible (especially with extensions). Re-initializing would serve no purpose in this scenario.
  • Application that dynamically load libOpenCL.so or a library linked against it several time throughout the life of the applications, and potentially in parallel. In this scenario we would have a real memory leaks (every time the library is unloaded), and it couldn't be addressed by the above extension since we have no way (as far as I know) of reference counting the library. In this case I think we would be better served by https://github.com/KhronosGroup/OpenCL-ICD-Loader/pull/217/files but in addition vendor libraries should be unloaded. This is what we were worried about implementing previously due to potential race conditions. But I see that the Vulkan loader does indeed implement this here https://github.com/KhronosGroup/Vulkan-Loader/blob/dd8332d253cfaf3a9be306af4194e4dffc2e3b3c/loader/loader_windows.c#L98-L114 and there https://github.com/KhronosGroup/Vulkan-Loader/blob/dd8332d253cfaf3a9be306af4194e4dffc2e3b3c/loader/loader.c#L1897-L1901, and it unloads vendor libraries as well.

I am increasingly convinced we should address this problem properly through a proper library destructor. This may require vendors to update their implementation if unloading them is an issues, and some libraries could break if they have a bad de-initialization order and would need to be updated. We may also need to add an explicit destruction API to the layers. The fact that Vulkan is successfully implementing this strategy is a good indicator we should be able to as well. And it seems (to me) to be the only long term strategy that would reliably fix the issue.

Nonetheless, I don't think that implementing this extension today would be problematic down the road, since we can always make it a noop if we decide to implement the library destructor strategy.

@bashbaug
Copy link
Contributor Author

I'm not comfortable saying it's the right thing to do in all cases, but there are some interesting things that fall out if we only free memory and do not close library handles or otherwise unload the ICDs themselves. Referring back to my original list:

  1. We don't need to install a stub layer because all OpenCL handles remain valid and it should continue to be safe to make OpenCL calls using these handles. The main thing that changes after shutdown is that it would no longer be possible to enumerate platforms via clGetPlatformIDs.
  2. I think we would only need a handful of locks to be thread safe. Basically, we would only need locks in the functions that create or destroy the vendor list. All other OpenCL calls could and should remain lock-free.
  3. We could add a function to re-initialize the ICD loader if desired, though this would need a lock since it would be re-creating the vendor list.
  4. Testing would be challenging, but maybe not too bad if we had a re-initialization function?

Based on this analysis I think I've also convinced myself doing the shutdown as part of a library destructor should be safe too, as long as it doesn't close the library handles and unload the ICDs themselves. Does this make sense?

@Kerilk
Copy link
Contributor

Kerilk commented Aug 22, 2023

I see a problem with 2 & 3 in a scenario with layers. When freeing the layer memory, the dispatch tables that were passed to the layers are freed, so potentially corrupted afterwards see:

struct KHRLayer
{
// the loaded library object (true type varies on Linux versus Windows)
void *library;
// the dispatch table of the layer
struct _cl_icd_dispatch dispatch;
// The next layer in the chain
struct KHRLayer *next;
#ifdef CL_LAYER_INFO
// The layer library name
char *libraryName;
// the pointer to the clGetLayerInfo funciton
void *p_clGetLayerInfo;
#endif
};

const struct _cl_icd_dispatch *tdispatch;

tdispatch = target_dispatch;

Thus, any command already in flight may fail. This is why, in order to be thread safe you would need to be sure no new calls enter layers, and wait for all calls to be out of the layers, before de-allocating them and preventing any new entry. A read/write lock would work but I don't think we want to add that to every call.

Arguably, we could require layers to make a copy of the table, but that would be an update to the layer API.

Also, layers may have created/wrapped objects and these may not be dispatch-able through the loader.

I agree that we should be able to start by not releasing vendor library handle, and release everything as part of the library destructor. In a well behaved environment, no call should be made to a library once it's destructor is called since any program or library using the library should hold a reference to it. Also the destruction function but also the re-initialization function become unneeded in this scenario, as the library would be reinitialized if it becomes reloaded.

WRT the current implementation proposal, I would reverse the order of de-initialization, releasing the layers first and then the vendor list to mirror the order of initialization. I would also make sure the release order of each resource (layer or vendor list entry) is done in reverse order of their initialization. This will be helpful if when we add a de-initialization method to layers (and the destructor would be a perfect place to call it) and if at some point we decide to release vendor library handles.

@bashbaug
Copy link
Contributor Author

Thus, any command already in flight may fail.

Ah, crud, good point. Yes, calls inside layers would have a problem if we're freeing dispatch tables out from under them.

WRT the current implementation proposal, I would reverse the order of de-initialization, releasing the layers first and then the vendor list to mirror the order of initialization. I would also make sure the release order of each resource (layer or vendor list entry) is done in reverse order of their initialization.

Sure, I can do this. To be clear, to release in reverse order I would:

  1. Free layers first, from the front of the list to the back.
  2. Free vendors second, from the back of the list to the front.

@Kerilk
Copy link
Contributor

Kerilk commented Aug 22, 2023

Sure, I can do this. To be clear, to release in reverse order I would:

1. Free layers first, from the front of the list to the back.

2. Free vendors second, from the back of the list to the front.

This is my understanding as well.

@Kerilk
Copy link
Contributor

Kerilk commented Aug 23, 2023

@bashbaug I though a bit more about the layer issues. Even if we do not release the layer library handle, there is no guaranty a layer could be reinitialized without issues (leaks or incorrect state). I prepared a PR to address this issue: KhronosGroup/OpenCL-Docs#962. This would allow a layer to be de-initialized and then later reinitialized (we should also be able to release the library handle in this case). We may also leverage this functionality for testing the proper behavior of the initialization de-initialization mechanism of the loader itself:

I envision two test cases corresponding to the two usages described above:

  • Application linked against libOpenCL.so
  • Application that uses lisOpencl.so in a loop through dlopen, dlsym, and dlclose

One or more test layers could validate the good behavior and their output could be used as a source of truth. The simplest one would be a layer that counts the OpenCL calls and outputs this count at termination.

When you feel the time is right, and if you want, I can make a PR to your branch implementing the termination call, tests, and layers.

@bashbaug
Copy link
Contributor Author

I switched the shutdown order so the objects are freed in the reverse order they are initialized.

Do we think the stub layer is valuable (item (2) in my original list)? If it isn't, I can take it our of this PR, and it will become significantly smaller.

@Kerilk
Copy link
Contributor

Kerilk commented Aug 29, 2023

I think it depends where you want to take it. If you want to enable the extension function, as in your original plan, then it may alleviate some failures, but so would just removing it. If we go with the library destructor, no call to the library should be made once the destructor is entered, so having the layer could allow diagnostics (especially is it was full of assert(0)) and identify problematic behavior, but I don't think it should be included for releases.

So I think I still see value in having the stub layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants