-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add an OpenCL ICD loader extension to shut down and free memory #224
base: main
Are you sure you want to change the base?
Conversation
Thanks for implementing this @bashbaug, and for raising the very good questions.
I am increasingly convinced we should address this problem properly through a proper library destructor. This may require vendors to update their implementation if unloading them is an issues, and some libraries could break if they have a bad de-initialization order and would need to be updated. We may also need to add an explicit destruction API to the layers. The fact that Vulkan is successfully implementing this strategy is a good indicator we should be able to as well. And it seems (to me) to be the only long term strategy that would reliably fix the issue. Nonetheless, I don't think that implementing this extension today would be problematic down the road, since we can always make it a noop if we decide to implement the library destructor strategy. |
I'm not comfortable saying it's the right thing to do in all cases, but there are some interesting things that fall out if we only free memory and do not close library handles or otherwise unload the ICDs themselves. Referring back to my original list:
Based on this analysis I think I've also convinced myself doing the shutdown as part of a library destructor should be safe too, as long as it doesn't close the library handles and unload the ICDs themselves. Does this make sense? |
I see a problem with 2 & 3 in a scenario with layers. When freeing the layer memory, the dispatch tables that were passed to the layers are freed, so potentially corrupted afterwards see: OpenCL-ICD-Loader/loader/icd.h Lines 109 to 123 in 229410f
Thus, any command already in flight may fail. This is why, in order to be thread safe you would need to be sure no new calls enter layers, and wait for all calls to be out of the layers, before de-allocating them and preventing any new entry. A read/write lock would work but I don't think we want to add that to every call. Arguably, we could require layers to make a copy of the table, but that would be an update to the layer API. Also, layers may have created/wrapped objects and these may not be dispatch-able through the loader. I agree that we should be able to start by not releasing vendor library handle, and release everything as part of the library destructor. In a well behaved environment, no call should be made to a library once it's destructor is called since any program or library using the library should hold a reference to it. Also the destruction function but also the re-initialization function become unneeded in this scenario, as the library would be reinitialized if it becomes reloaded. WRT the current implementation proposal, I would reverse the order of de-initialization, releasing the layers first and then the vendor list to mirror the order of initialization. I would also make sure the release order of each resource (layer or vendor list entry) is done in reverse order of their initialization. This will be helpful if when we add a de-initialization method to layers (and the destructor would be a perfect place to call it) and if at some point we decide to release vendor library handles. |
Ah, crud, good point. Yes, calls inside layers would have a problem if we're freeing dispatch tables out from under them.
Sure, I can do this. To be clear, to release in reverse order I would:
|
This is my understanding as well. |
@bashbaug I though a bit more about the layer issues. Even if we do not release the layer library handle, there is no guaranty a layer could be reinitialized without issues (leaks or incorrect state). I prepared a PR to address this issue: KhronosGroup/OpenCL-Docs#962. This would allow a layer to be de-initialized and then later reinitialized (we should also be able to release the library handle in this case). We may also leverage this functionality for testing the proper behavior of the initialization de-initialization mechanism of the loader itself: I envision two test cases corresponding to the two usages described above:
One or more test layers could validate the good behavior and their output could be used as a source of truth. The simplest one would be a layer that counts the OpenCL calls and outputs this count at termination. When you feel the time is right, and if you want, I can make a PR to your branch implementing the termination call, tests, and layers. |
I switched the shutdown order so the objects are freed in the reverse order they are initialized. Do we think the stub layer is valuable (item (2) in my original list)? If it isn't, I can take it our of this PR, and it will become significantly smaller. |
I think it depends where you want to take it. If you want to enable the extension function, as in your original plan, then it may alleviate some failures, but so would just removing it. If we go with the library destructor, no call to the library should be made once the destructor is entered, so having the layer could allow diagnostics (especially is it was full of So I think I still see value in having the stub layer. |
This is a proposed solution to address #157. It adds an ICD loader extension that can be explicitly called by the host application (or possibly a layer?) to shut down and free memory.
I'm calling this WIP for now as I would like feedback in a few areas:
I'll write up an "extension spec" for this when it's a bit further along but I'd like to work through some of these details first. Thanks!