[BugFix] fix compression context pool slow down after long running #53172

luohaha · 2024-11-25T08:15:34Z

Why I'm doing:

In current implementation, we use moodycamel::concurrentqueue to reuse compression context, each time we start compress one block, we will try to dequeue ctx from pool, and then return ctx to pool after compression finish.

And now we use implicit enqueue method, which causes an automatically-allocated thread-local producer sub-queue to be allocated, and it won't destroy after thread finish:

void add(InternalRef ptr) {
        DCHECK(ptr);
        Status status = _resetter(ptr.get());
        // if reset fail, then delete this context
        if (!status.ok()) {
            return;
        }

        _ctx_resources.enqueue(std::move(ptr));
    }

So after long running, sub-queue will keep growing without bound and slow down consumer.

And in doc (https://github.com/cameron314/concurrentqueue?tab=readme-ov-file#basic-use), author recommend to use explicit producer tokens instead.

What I'm doing:

Use explicit producer tokens to avoid the overhead of too many sub-ququeues.

This pull request introduces improvements to the compression context pool and adds new tests to ensure the robustness of the multi-threaded context retrieval. The most important changes include the addition of a producer token to optimize the context pool and the introduction of a new multi-threaded test.

Improvements to compression context pool:

be/src/util/compression/compression_context_pool.h: Added a thread-local ProducerToken to the add method to reduce the overhead of multiple sub-queues when enqueuing contexts.

Enhancements to testing:

be/test/util/block_compression_test.cpp: Included the compression_context_pool_singletons.h header to support new tests.
be/test/util/block_compression_test.cpp: Added a new test test_multi_thread_get_ctx to verify the behavior of multi-threaded context retrieval from the LZ4F context pool.

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Signed-off-by: luohaha <[email protected]>

kevincai · 2024-11-25T11:11:59Z

be/src/util/compression/compression_context_pool.h

@@ -114,14 +114,19 @@ class CompressionContextPool {

 private:
    void add(InternalRef ptr) {
+        // Use explicit producer token to avoid the overhead of too many sub-ququeues
+        static thread_local std::unique_ptr<::moodycamel::ProducerToken> producer_token;


this doesn't look right, the number of producer_token equals to the number of threads to be used for the compression. This depends on how the compression context is used by the caller, if the compression is performed in a fixed number of thread pool, that can be still OK, but if it is dynamic thread pool with thread create and destroy, the producer token can be still a lot.

Not sure if I understand this correct.

You can check the implementation of ExplicitProducer, ProducerToken can reuse queue:

ProducerToken::ProducerToken(ConcurrentQueue<T, Traits>& queue) : producer(queue.recycle_or_create_producer(true)) { if (producer != nullptr) { producer->token = this; } }

ProducerBase* recycle_or_create_producer(bool isExplicit, bool& recycled) { #ifdef MCDBGQ_NOLOCKFREE_IMPLICITPRODHASH debug::DebugLock lock(implicitProdMutex); #endif // Try to re-use one first for (auto ptr = producerListTail.load(std::memory_order_acquire); ptr != nullptr; ptr = ptr->next_prod()) { if (ptr->inactive.load(std::memory_order_relaxed) && ptr->isExplicit == isExplicit) { bool expected = true; if (ptr->inactive.compare_exchange_strong(expected, /* desired */ false, std::memory_order_acquire, std::memory_order_relaxed)) { // We caught one! It's been marked as activated, the caller can have it recycled = true; return ptr; } } } recycled = false; return add_producer(isExplicit ? static_cast<ProducerBase*>(create<ExplicitProducer>(this)) : create<ImplicitProducer>(this)); }

I mean if the threads are created and destroyed frequently. This thread local producerToken will be created and destroyed the same frequent as the thread. How can the the inner queue be reused by the producer token in such case?

So the upper bound of producer queue number is max(flush thread) + max(compaction thread).

Instead of depending on the caller's thread num, how about the context thread pool itself controls the number of total producer's token, and use thread id hash or round robin to assign to certain producer token (slot).

Or is it overkill to use multiple-producers concurrent queue just for a memory pool usage. Is a single producer good enough to serve the usage?

Producer token can't be shared by two threads at the same time, so we will still need max(flush thread) + max(compaction thread) tokens.

The next thing is, what if the thread lives longer than this _ctx_resources? the destruction of a thread local producer_token will cause invalid access to the _ctx_resources

_ctx_resources is part of CompressionContextPool, and all of CompressionContextPool are static and global. CompressionContextPool will always live longer than Ref.

be/src/util/compression/compression_context_pool.h

Signed-off-by: luohaha <[email protected]>

xiangguangyxg · 2024-11-26T02:10:44Z

be/src/util/compression/compression_context_pool.h

+        if (producer_token == nullptr) {
+            producer_token = std::make_unique<::moodycamel::ProducerToken>(_ctx_resources);
+        }


it's not thread-safe? could change it to the following:
static thread_local ::moodycamel::ProducerToken producer_token(_ctx_resources);

Why it's not thread safe, producer_token is thread local.

Oh, it's thrad-safe, just a refactor

Signed-off-by: luohaha <[email protected]>

github-actions · 2024-11-26T14:35:04Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-11-26T14:35:08Z

[FE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-11-26T14:46:09Z

[BE Incremental Coverage Report]

✅ pass : 2 / 2 (100.00%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	be/src/util/compression/compression_context_pool.h	2	2	100.00%	[]

github-actions · 2024-11-27T05:57:21Z

@Mergifyio backport branch-3.4

github-actions · 2024-11-27T05:57:22Z

@Mergifyio backport branch-3.3

github-actions · 2024-11-27T05:57:24Z

@Mergifyio backport branch-3.2

mergify · 2024-11-27T05:57:24Z

backport branch-3.4

✅ Backports have been created

#53230 [BugFix] fix compression context pool slow down after long running (backport #53172) has been created for branch branch-3.4

github-actions · 2024-11-27T05:57:25Z

@Mergifyio backport branch-3.1

mergify · 2024-11-27T05:57:29Z

backport branch-3.3

✅ Backports have been created

#53231 [BugFix] fix compression context pool slow down after long running (backport #53172) has been created for branch branch-3.3

mergify · 2024-11-27T05:57:30Z

backport branch-3.2

✅ Backports have been created

#53232 [BugFix] fix compression context pool slow down after long running (backport #53172) has been created for branch branch-3.2

mergify · 2024-11-27T05:57:31Z

backport branch-3.1

✅ Backports have been created

#53233 [BugFix] fix compression context pool slow down after long running (backport #53172) has been created for branch branch-3.1

…53172) Signed-off-by: luohaha <[email protected]> (cherry picked from commit b141be8)

…ackport #53172) (#53233) Co-authored-by: Yixin Luo <[email protected]>

…ackport #53172) (#53232) Co-authored-by: Yixin Luo <[email protected]>

…ackport #53172) (#53230) Co-authored-by: Yixin Luo <[email protected]>

…ackport #53172) (#53231) Co-authored-by: Yixin Luo <[email protected]>

[BugFix] fix compression context pool slow down after long running

0815347

Signed-off-by: luohaha <[email protected]>

luohaha requested a review from a team as a code owner November 25, 2024 08:15

github-actions bot added 3.4 3.3 3.2 3.1 labels Nov 25, 2024

mergify bot assigned luohaha Nov 25, 2024

update

e00a904

Signed-off-by: luohaha <[email protected]>

kevincai reviewed Nov 25, 2024

View reviewed changes

be/src/util/compression/compression_context_pool.h Outdated Show resolved Hide resolved

update

b4f76c2

Signed-off-by: luohaha <[email protected]>

kevincai previously approved these changes Nov 25, 2024

View reviewed changes

luohaha requested a review from wyb November 26, 2024 01:35

xiangguangyxg reviewed Nov 26, 2024

View reviewed changes

refactor

908b6ed

Signed-off-by: luohaha <[email protected]>

luohaha dismissed kevincai’s stale review via 908b6ed November 26, 2024 12:43

kevincai approved these changes Nov 27, 2024

View reviewed changes

xiangguangyxg approved these changes Nov 27, 2024

View reviewed changes

wyb approved these changes Nov 27, 2024

View reviewed changes

wyb merged commit b141be8 into StarRocks:main Nov 27, 2024
51 checks passed

github-actions bot removed the 3.4 label Nov 27, 2024

github-actions bot removed the 3.3 label Nov 27, 2024

github-actions bot removed the 3.2 label Nov 27, 2024

github-actions bot removed the 3.1 label Nov 27, 2024

mergify bot pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (#…

6819774

…53172) Signed-off-by: luohaha <[email protected]> (cherry picked from commit b141be8)

mergify bot mentioned this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (backport #53172) #53230

Merged

42 tasks

mergify bot pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (#…

658c84a

…53172) Signed-off-by: luohaha <[email protected]> (cherry picked from commit b141be8)

mergify bot mentioned this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (backport #53172) #53231

Merged

42 tasks

mergify bot pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (#…

ebf32b9

…53172) Signed-off-by: luohaha <[email protected]> (cherry picked from commit b141be8)

mergify bot mentioned this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (backport #53172) #53232

Merged

42 tasks

mergify bot pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (#…

d34441c

…53172) Signed-off-by: luohaha <[email protected]> (cherry picked from commit b141be8)

mergify bot mentioned this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (backport #53172) #53233

Merged

42 tasks

wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (b…

8bce628

…ackport #53172) (#53233) Co-authored-by: Yixin Luo <[email protected]>

github-actions bot added the 3.1-merged label Nov 27, 2024

wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (b…

0e999d1

…ackport #53172) (#53232) Co-authored-by: Yixin Luo <[email protected]>

github-actions bot added the 3.2-merged label Nov 27, 2024

wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (b…

1ad8550

…ackport #53172) (#53230) Co-authored-by: Yixin Luo <[email protected]>

github-actions bot added the 3.4-merged label Nov 27, 2024

wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024

[BugFix] fix compression context pool slow down after long running (b…

82da570

…ackport #53172) (#53231) Co-authored-by: Yixin Luo <[email protected]>

github-actions bot added the 3.3-merged label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] fix compression context pool slow down after long running #53172

[BugFix] fix compression context pool slow down after long running #53172

luohaha commented Nov 25, 2024

kevincai Nov 25, 2024

luohaha Nov 25, 2024

luohaha Nov 25, 2024

kevincai Nov 25, 2024

luohaha Nov 25, 2024

kevincai Nov 25, 2024

kevincai Nov 25, 2024

luohaha Nov 25, 2024

kevincai Nov 25, 2024

luohaha Nov 25, 2024 •

edited

Loading

xiangguangyxg Nov 26, 2024

luohaha Nov 26, 2024

xiangguangyxg Nov 26, 2024

luohaha Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

mergify bot commented Nov 27, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024

mergify bot commented Nov 27, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading

[BugFix] fix compression context pool slow down after long running #53172

[BugFix] fix compression context pool slow down after long running #53172

Conversation

luohaha commented Nov 25, 2024

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luohaha Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 26, 2024

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Nov 26, 2024

[FE Incremental Coverage Report]

github-actions bot commented Nov 26, 2024

[BE Incremental Coverage Report]

file detail

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

mergify bot commented Nov 27, 2024 • edited Loading

✅ Backports have been created

github-actions bot commented Nov 27, 2024

mergify bot commented Nov 27, 2024 • edited Loading

✅ Backports have been created

mergify bot commented Nov 27, 2024 • edited Loading

✅ Backports have been created

mergify bot commented Nov 27, 2024 • edited Loading

✅ Backports have been created

luohaha Nov 25, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading

mergify bot commented Nov 27, 2024 •

edited

Loading