You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cuDF accesses the cuFile API via the cufile_shim object that has a static storage duration, meaning its destructor is called after the main function returns. The cuFileDriverClose() internally calls CUDA API, resulting in UB (usually manifested as segfault), and therefore should not be called here. However, even in the absence of cuFileDriverClose(), cuFile will implicitly close the driver, during which process some CUDA calls are still made, likely causing segfault. The best way to clean up the resources needs to be revisited in the future.
For the time being, segfault cannot be completely avoided under the GDS I/O path, but at least cuFileDriverClose() should be removed from the destructor of cufile_shim.
We've tackled similar issues in the past by leaking the resource and allow the OS to reclaim the objects when the process is terminated (see e.g. rapidsai/rmm#1375). That seems like the thing you're suggesting with removing the explicit close, but it doesn't resolve the other issue if cufile is also executing past main due to the static duration of the object created by cufile.
This PR makes small improvements for the I/O code. Specifically,
- Place type constraint on a template class to allow only for rvalue argument. In addition, replace `std::move` with `std::forward` to make the code more *apparently* consistent with the convention, i.e. use `std::move()` on the rvalue references, and `std::forward` on the forwarding references (Effective modern C++ item 25).
- Alleviate (but not completely resolve) an existing cuFile driver close issue by removing the explicit driver close call. See #17121
- Minor typo fix (`struct` → `class`).
Authors:
- Tianyu Liu (https://github.com/kingcrimsontianyu)
Approvers:
- Nghia Truong (https://github.com/ttnghia)
- Vukasin Milovanovic (https://github.com/vuule)
URL: #17105
Describe the bug
cuDF accesses the cuFile API via the cufile_shim object that has a static storage duration, meaning its destructor is called after the main function returns. The
cuFileDriverClose()
internally calls CUDA API, resulting in UB (usually manifested as segfault), and therefore should not be called here. However, even in the absence ofcuFileDriverClose()
, cuFile will implicitly close the driver, during which process some CUDA calls are still made, likely causing segfault. The best way to clean up the resources needs to be revisited in the future.For the time being, segfault cannot be completely avoided under the GDS I/O path, but at least
cuFileDriverClose()
should be removed from the destructor of cufile_shim.Related issues from KvikIO:
rapidsai/kvikio#497
Steps/Code to reproduce bug
Run any program using GDS I/O.
Expected behavior
Free of segmentation fault.
Environment overview (please complete the following information)
N/A
Environment details
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: