Fix ann-bench multithreading #2021

achirkin · 2023-11-22T10:50:34Z

In the current state, ann-benchmarks running in the --throughput mode (multi-threaded) share ANN wrappers among CPU threads. This is not thread-safe and may result in incorrectly measured time (e.g. sharing cuda events among CPU threads) or various exceptions and segfaults (e.g. doing state-changing cublas calls from multiple CPU threads).

This PR makes the search benchmarks copy ANN wrappers in each thread. The copies of the wrappers then selectively:

share thread-safe resources (e.g. rmm memory pool) and large objects that are not expected to change during search (e.g. index data);
duplicate the resources that are not thread-safe or carry the thread-specific state (e.g. cublas handles, CUDA events and streams).

Alongside, the PR adds a few small changes, including:

enables ann-bench NVTX annotations for the non-common-executable mode (shows benchmark labels and iterations in nsys timeline);
fixes compile errors for the common-executable mode.

…cing the raft stream to CPU in all raft algos

…fe resources

achirkin · 2023-11-24T14:48:53Z

Progress notes:

At the moment, a new raft::resources handle is created for each thread, as the handle seems to be not thread-safe. This can be quickly fixed by modifying the copy-constructor of the configured_raft_resources once we're sure raft::resources is thread-safe.
The PR needs more testing!

…ceeds

wphicks

Just one substantive question, but this looks great. Love the design of configured_raft_resources.

wphicks · 2023-11-30T14:48:13Z

cpp/bench/ann/src/faiss/faiss_gpu_wrapper.h

+   *
+   * To avoid copying the index (database) in each thread, we make both the index and
+   * the gpu_resource shared.
+   * This means faiss GPU streams are possibly shared among the CPU threads;


Do we have an issue tracking this?

cpp/bench/ann/src/faiss/faiss_gpu_wrapper.h

cjnolet · 2023-12-03T23:20:04Z

@wphicks this PR isn't quite ready to be merged yet. We have some further discussions before this is ready.

cjnolet

Making sure we discuss this before merging.

harrism

@cjnolet pointed me to this PR because I was asking about uses of MR comparisons. Looking at the call to is_equal used in this PR I had a few other suggestions to improve the surrounding code.

harrism · 2023-12-07T01:37:59Z

cpp/bench/ann/src/raft/raft_ann_bench_utils.h

+  configured_raft_resources()
+    : configured_raft_resources{
+        {[]() {
+           auto* mr = new rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>{
+             rmm::mr::get_current_device_resource(), 1024 * 1024 * 1024ull};
+           rmm::mr::set_current_device_resource(mr);
+           return mr;
+         }(),
+         [](rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>* mr) {
+           if (rmm::mr::get_current_device_resource()->is_equal(*mr)) {
+             rmm::mr::set_current_device_resource(mr->get_upstream());
+           }
+           delete mr;
+         }}}


suggestion: Initialize the members of the class by name, not as a struct.
suggestion: don't bury the re-setting of the previous device resource inside an unnamed deleter lambda. Just reset it in your class destructor.
question: does the mr_ really need to be shared?
suggestion: use a unique_ptr.

Suggested change

configured_raft_resources()

: configured_raft_resources{

{[]() {

auto* mr = new rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>{

rmm::mr::get_current_device_resource(), 1024 * 1024 * 1024ull};

rmm::mr::set_current_device_resource(mr);

return mr;

}(),

[](rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>* mr) {

if (rmm::mr::get_current_device_resource()->is_equal(*mr)) {

rmm::mr::set_current_device_resource(mr->get_upstream());

}

delete mr;

}}}

configured_raft_resources()

: mr_{make_unique<rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>>(

rmm::mr::get_current_device_resource(), 1024 * 1024 * 1024ull)}

{

}

~configured_raft_resources()

{

// reset the previous current_device_resource before deleting our pool.

rmm::mr::set_current_device_resource(mr_->get_upstream());

}

Note that this also removes the call to is_equal() -- is there a reason in this benchmark that you are worried about the possibility of the current resource changing during the lifetime of this class?

In the future, please use operator== on MRs, because is_equal is likely to be removed.

Hi Mark, thanks for having a look at this!

First of all, yes, we must use the shared pointer here. We create a new configured_raft_resources at most once per benchmark case, and let multiple concurrent threads spawned by gbench copy it. We do this because we need some of the resources not be shared due to thread safety (the raft handle and the event). However, we must have exactly one rmm pool resource shared by everything, which is dictated by rmm current_device_resource semantics. Hence we need the pool resource be initialized with the original configured_raft_resources and destructed after the last copy of configured_raft_resources is destructed.

The same applies to setting/unsetting the rmm current_device_resource. Hence I cannot just reset it my class destructor and not bury it inside an unnamed deleter lambda. An alternative would be to create a new class with the destructor that does the same and put it in the shared pointer, but why bother?

Also, I'm just curious as I'm not that experienced with the latest C++ best practices. What are the downsides of using curly braces to initialize the members "as a struct" as I did in this code?

The downsides are that you aren't initializing them by name, so it is less readable, and if the order of the class members changes this initialization will be invalid.

My suggestions were aimed at improving the readability and maintainability of the code, which currently I find obfuscated and error prone. But really this is not a library I have to read or maintain so it's up to you and the RAFT reviewers.

The one thing that I ask (and the thing that brought me here) is that you remove the call to is_equal() for the resources, as this is probably going away. If you must compare the resources, use ==.

See rapidsai/rmm#1402

Thanks for pointing out to the rmm issue, I've updated the code and added a bit more comments alongside to make the intent clear.

I'm genuinely willing to improve the readability of the code, but I'm still not sure I understand what you mean regarding the initializing-as-a-struct issue. To me the pattern seems pretty similar to your rmm::cuda_stream, which I don't find obfuscated or unreadable. Could you please adjust your suggestion to not change the semantics of the code (lifetime of the pool)?

@achirkin the big difference between the rmm example you've pointed out and the example here is that the rmm initializers are still naming the members so if the names should change or he rearranged, maintenance becomes much more straightforward.

I do agree with Mark here, and to be quite honest, I find the use of the pattern here (unnamed parameter initialization through lambdas) to be quite hard to read myself. I also prefer that we at least name the parameters, especially for a list that includes multiple different members. It seems like not much additional work in proportion to the significant gain in readability.

Ok, now I really feel dumb staring at the code and not seeing what you both mean :) I'm just calling here another constructor with a single argument that takes the pool, am I not? If you're referring to the aggregate initialization, it's not even possible for this class, because it has two user-declared constructors.

…or readability

cjnolet

LGTM!

cjnolet · 2023-12-13T16:30:30Z

/merge

With the changes introduced by #2021, the copied FAISS benchmark wrapper contains a cuda event that is used for synchronizing between streams during search. The lifetime of the event is the same as of the wrapper, but the event handle itself is copied between the wrappers; this leads to illegal memory accesses and crashes. This PR fixes the bug by creating a new cuda event on each wrapper copy, so that the wrappers do not share their synchronization events. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #2062

In the current state, ann-benchmarks running in the `--throughput` mode (multi-threaded) share ANN wrappers among CPU threads. This is not thread-safe and may result in incorrectly measured time (e.g. sharing cuda events among CPU threads) or various exceptions and segfaults (e.g. doing state-changing cublas calls from multiple CPU threads). This PR makes the search benchmarks copy ANN wrappers in each thread. The copies of the wrappers then selectively: - share thread-safe resources (e.g. rmm memory pool) and large objects that are not expected to change during search (e.g. index data); - duplicate the resources that are not thread-safe or carry the thread-specific state (e.g. cublas handles, CUDA events and streams). Alongside, the PR adds a few small changes, including: - enables ann-bench NVTX annotations for the non-common-executable mode (shows benchmark labels and iterations in nsys timeline); - fixes compile errors for the common-executable mode. Authors: - Artem M. Chirkin (https://github.com/achirkin) - William Hicks (https://github.com/wphicks) Approvers: - William Hicks (https://github.com/wphicks) - Mark Harris (https://github.com/harrism) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#2021

…apidsai#2062) With the changes introduced by rapidsai#2021, the copied FAISS benchmark wrapper contains a cuda event that is used for synchronizing between streams during search. The lifetime of the event is the same as of the wrapper, but the event handle itself is copied between the wrappers; this leads to illegal memory accesses and crashes. This PR fixes the bug by creating a new cuda event on each wrapper copy, so that the wrappers do not share their synchronization events. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#2062

achirkin added 2 commits November 22, 2023 10:45

Cleanup the workaround code from past changes

6273248

Apply the same pattern of waiting by the bench stream rather than syn…

19cb314

…cing the raft stream to CPU in all raft algos

github-actions bot added cpp CMake labels Nov 22, 2023

Copy benchmark wrappers to avoid concurrently accessing not thread-sa…

9359e89

…fe resources

achirkin added bug Something isn't working non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels Nov 23, 2023

cjnolet marked this pull request as ready for review November 23, 2023 16:14

cjnolet requested review from a team as code owners November 23, 2023 16:14

achirkin and others added 4 commits November 24, 2023 11:30

Add ANN.copy support to faiss wrappers

8c798c9

Simplify and fix the nvtx3 checks

9223dca

Merge branch 'branch-24.02' into fix-ann-bench-multithreading

9804dcd

Add ANN.copy support to cagra_hnswlib wrapper

8855197

achirkin self-assigned this Nov 24, 2023

achirkin changed the title ~~[WIP] Fix ann-bench multithreading~~ Fix ann-bench multithreading Nov 24, 2023

Check the NVTX headers explicitly to make sure the cpu-only build suc…

2d66d61

…ceeds

achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Nov 27, 2023

Merge branch 'branch-24.02' into fix-ann-bench-multithreading

1e89b3e

wphicks approved these changes Nov 30, 2023

View reviewed changes

Merge branch 'branch-24.02' into fix-ann-bench-multithreading

75a4928

rapidsai deleted a comment from wphicks Dec 3, 2023

cjnolet requested changes Dec 3, 2023

View reviewed changes

harrism suggested changes Dec 7, 2023

View reviewed changes

achirkin and others added 2 commits December 11, 2023 10:22

Merge branch 'branch-24.02' into fix-ann-bench-multithreading

16d9743

Avoid using rmm's equals function and add more comments to the code f…

75bf35a

…or readability

harrism approved these changes Dec 12, 2023

View reviewed changes

Add raft::raft dependency to ANN_BENCH

0719beb

cjnolet approved these changes Dec 13, 2023

View reviewed changes

rapids-bot bot merged commit d9a7290 into rapidsai:branch-24.02 Dec 13, 2023
61 checks passed

achirkin mentioned this pull request Dec 13, 2023

Fix a crash in FAISS benchmark wrapper introduced in #2021 #2062

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ann-bench multithreading #2021

Fix ann-bench multithreading #2021

achirkin commented Nov 22, 2023 •

edited

Loading

achirkin commented Nov 24, 2023 •

edited

Loading

wphicks left a comment

wphicks Nov 30, 2023

cjnolet commented Dec 3, 2023

cjnolet left a comment

harrism left a comment

harrism Dec 7, 2023 •

edited

Loading

achirkin Dec 9, 2023 •

edited

Loading

harrism Dec 10, 2023 •

edited

Loading

harrism Dec 10, 2023

achirkin Dec 11, 2023

cjnolet Dec 12, 2023

achirkin Dec 12, 2023 •

edited

Loading

cjnolet left a comment

cjnolet commented Dec 13, 2023

Fix ann-bench multithreading #2021

Fix ann-bench multithreading #2021

Conversation

achirkin commented Nov 22, 2023 • edited Loading

achirkin commented Nov 24, 2023 • edited Loading

wphicks left a comment

Choose a reason for hiding this comment

wphicks Nov 30, 2023

Choose a reason for hiding this comment

cjnolet commented Dec 3, 2023

cjnolet left a comment

Choose a reason for hiding this comment

harrism left a comment

Choose a reason for hiding this comment

harrism Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

achirkin Dec 9, 2023 • edited Loading

Choose a reason for hiding this comment

harrism Dec 10, 2023 • edited Loading

Choose a reason for hiding this comment

harrism Dec 10, 2023

Choose a reason for hiding this comment

achirkin Dec 11, 2023

Choose a reason for hiding this comment

cjnolet Dec 12, 2023

Choose a reason for hiding this comment

achirkin Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Dec 13, 2023

achirkin commented Nov 22, 2023 •

edited

Loading

achirkin commented Nov 24, 2023 •

edited

Loading

harrism Dec 7, 2023 •

edited

Loading

achirkin Dec 9, 2023 •

edited

Loading

harrism Dec 10, 2023 •

edited

Loading

achirkin Dec 12, 2023 •

edited

Loading