-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor memory ownership svm #6073
base: branch-24.10
Are you sure you want to change the base?
refactor memory ownership svm #6073
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the C and Python APIs need a copy and ultimately, this is because rmm::device_buffer
in this structure as well as the model is owned by the struct/class. Instead, if we use std::weak_ptr<rmm::device_buffer>>
or plain rmm::device_buffer&
then we can avoid copies and ownership issues by letting the C and Python API own the data but still allowing the C++ API to resize it.
The C API can also release the shared_ptr/device_buffer to transfer ownership to the user, thus removing the need for a copy there.
The Python API is owned by us and so, we can create shared_ptr/device_buffer in the Cython layer and store them as class members.
cpp/src/svm/svc_impl.cuh
Outdated
rmm::device_uvector<math_t> unique_labels(0, stream); | ||
model.n_classes = raft::label::getUniquelabels(unique_labels, labels, n_rows, stream); | ||
rmm::device_async_resource_ref rmm_alloc = rmm::mr::get_current_device_resource(); | ||
model.unique_labels = (math_t*)rmm_alloc.allocate_async( | ||
model.n_classes * sizeof(math_t), rmm::CUDA_ALLOCATION_ALIGNMENT, stream); | ||
raft::copy(model.unique_labels, unique_labels.data(), model.n_classes, stream); | ||
model.unique_labels.resize(model.n_classes * sizeof(math_t), stream); | ||
raft::copy((math_t*)model.unique_labels.data(), unique_labels.data(), model.n_classes, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is unique_labels
being re-used elsewhere or can this copy be avoided by directly passing modeul.unique_labels
to getUniquelabels
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is only used locally. However, I cannot pass a plain device_buffer
into raft::label::getUniquelabels
as it only takes device_uvector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use unique_labels.release()
to get an r-value device_buffer
and avoid the copy https://github.com/rapidsai/rmm/blob/b51447393c523cc929608d84850c70a3eae08af3/include/rmm/device_uvector.hpp#L414
cpp/src/svm/svc_impl.cuh
Outdated
|
||
// Unfortunately we need runtime support for both types | ||
raft::device_matrix_view<math_t, int, raft::layout_stride> dense_support_matrix_view; | ||
if (is_dense_support) { | ||
dense_support_matrix_view = | ||
raft::make_device_strided_matrix_view<math_t, int, raft::layout_f_contiguous>( | ||
model.support_matrix.data, model.n_support, n_cols, 0); | ||
(math_t*)model.support_matrix.data.data(), model.n_support, n_cols, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have switched to reinterpret_cast in many places now. Here in this context I cannot easily switch as the model
is const
. To my understanding resolving this here would require subsequent const_casts
or worse.
cpp/src/svm/svc_impl.cuh
Outdated
n_cols, | ||
model.support_matrix.nnz) | ||
? raft::make_device_compressed_structure_view<int, int, int>( | ||
(int*)model.support_matrix.indptr.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
cpp/src/svm/svc_impl.cuh
Outdated
? raft::make_device_csr_matrix_view<math_t, int, int, int>(model.support_matrix.data, | ||
csr_structure_view) | ||
? raft::make_device_csr_matrix_view<math_t, int, int, int>( | ||
(math_t*)model.support_matrix.data.data(), csr_structure_view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
cpp/src/svm/svc_impl.cuh
Outdated
@@ -278,7 +277,7 @@ void svcPredictX(const raft::handle_t& handle, | |||
&one, | |||
K.data(), | |||
transpose_kernel ? model.n_support : n_batch, | |||
model.dual_coefs, | |||
(math_t*)model.dual_coefs.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
cpp/src/svm/svc_impl.cuh
Outdated
@@ -287,7 +286,7 @@ void svcPredictX(const raft::handle_t& handle, | |||
|
|||
} // end of loop | |||
|
|||
math_t* labels = model.unique_labels; | |||
math_t* labels = (math_t*)model.unique_labels.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
cpp/src/svm/svc_impl.cuh
Outdated
} | ||
cudaStream_t stream = handle.get_stream(); | ||
|
||
// Note that the underlying allocations are not *freed* but rather reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is the case with resize()
. Why not deallocate_async()
instead to free memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deallocate_async()
is a private method. I added additional calls to shrink_to_fit()
to ensure memory is freed right away.
Thanks @divyegala for the rewiew. I have adressed or commented on your immediate suggestions.
I don't quite understand how this can be accomplished with the given For the python API we retrieve data as |
Let me write an example for Python API and how to handle ownership, assuming the C++ API does not own the memory and instead holds onto references of the form Python API: As seen in the
|
@divyegala , thanks for your suggestions. I have refactored the SvmModel to work with |
model.unique_labels->resize(model.n_classes * sizeof(math_t), stream); | ||
raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.unique_labels->resize(model.n_classes * sizeof(math_t), stream); | |
raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream); | |
model.unique_labels = unique_labels.release() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.unique_labels
is just a ptr to the buffer (e.g. owned by the python code), so nobody would be responsible to dealloc whatever we extract from the device_uvector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it
*dual_coefs = (double*)rmm_alloc.allocate_async( | ||
model.dual_coefs->size(), rmm::CUDA_ALLOCATION_ALIGNMENT, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user does not know the size of the arrays in advance, so we would need to change the C-API to rely on resizable objects (just like the C++ API utilizes the device_buffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm that is a tricky situation. @harrism any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a device_buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a C API?! My solution would be to not have a C API! (Why is a C API in a .cpp
file?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C API source code is written in .cpp
and we just export the API in .h
header via extern C
int* indptr | ||
int* indices | ||
math_t* data | ||
device_buffer* indptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these raw pointers? Why doesn't the class own the buffers?
For that matter, why doesn't the class have a constructor and a destructor (especially if it contains raw pointers!)
As discussed in #6057 this is a PR to refactor the raw memory allocations in the svm model which are passed back to the caller.
I have removed all raw allocation pointers from the SVM model and replaced them by device_buffers. Since ownership is more restrictive now we do have additional data copies when:
CC @tfeher, @divyegala