Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix compression context pool slow down after long running #53172

Merged
merged 4 commits into from
Nov 27, 2024

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Nov 25, 2024

Why I'm doing:

In current implementation, we use moodycamel::concurrentqueue to reuse compression context, each time we start compress one block, we will try to dequeue ctx from pool, and then return ctx to pool after compression finish.

And now we use implicit enqueue method, which causes an automatically-allocated thread-local producer sub-queue to be allocated, and it won't destroy after thread finish:

void add(InternalRef ptr) {
        DCHECK(ptr);
        Status status = _resetter(ptr.get());
        // if reset fail, then delete this context
        if (!status.ok()) {
            return;
        }

        _ctx_resources.enqueue(std::move(ptr));
    }

So after long running, sub-queue will keep growing without bound and slow down consumer.

And in doc (https://github.com/cameron314/concurrentqueue?tab=readme-ov-file#basic-use), author recommend to use explicit producer tokens instead.

What I'm doing:

Use explicit producer tokens to avoid the overhead of too many sub-ququeues.

This pull request introduces improvements to the compression context pool and adds new tests to ensure the robustness of the multi-threaded context retrieval. The most important changes include the addition of a producer token to optimize the context pool and the introduction of a new multi-threaded test.

Improvements to compression context pool:

Enhancements to testing:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

Signed-off-by: luohaha <[email protected]>
@@ -114,14 +114,19 @@ class CompressionContextPool {

private:
void add(InternalRef ptr) {
// Use explicit producer token to avoid the overhead of too many sub-ququeues
static thread_local std::unique_ptr<::moodycamel::ProducerToken> producer_token;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right, the number of producer_token equals to the number of threads to be used for the compression. This depends on how the compression context is used by the caller, if the compression is performed in a fixed number of thread pool, that can be still OK, but if it is dynamic thread pool with thread create and destroy, the producer token can be still a lot.

Not sure if I understand this correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check the implementation of ExplicitProducer, ProducerToken can reuse queue:

ProducerToken::ProducerToken(ConcurrentQueue<T, Traits>& queue)
	: producer(queue.recycle_or_create_producer(true))
{
	if (producer != nullptr) {
		producer->token = this;
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProducerBase* recycle_or_create_producer(bool isExplicit, bool& recycled)
	{
#ifdef MCDBGQ_NOLOCKFREE_IMPLICITPRODHASH
		debug::DebugLock lock(implicitProdMutex);
#endif
		// Try to re-use one first
		for (auto ptr = producerListTail.load(std::memory_order_acquire); ptr != nullptr; ptr = ptr->next_prod()) {
			if (ptr->inactive.load(std::memory_order_relaxed) && ptr->isExplicit == isExplicit) {
				bool expected = true;
				if (ptr->inactive.compare_exchange_strong(expected, /* desired */ false, std::memory_order_acquire, std::memory_order_relaxed)) {
					// We caught one! It's been marked as activated, the caller can have it
					recycled = true;
					return ptr;
				}
			}
		}
		
		recycled = false;
		return add_producer(isExplicit ? static_cast<ProducerBase*>(create<ExplicitProducer>(this)) : create<ImplicitProducer>(this));
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the threads are created and destroyed frequently. This thread local producerToken will be created and destroyed the same frequent as the thread. How can the the inner queue be reused by the producer token in such case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the upper bound of producer queue number is max(flush thread) + max(compaction thread).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of depending on the caller's thread num, how about the context thread pool itself controls the number of total producer's token, and use thread id hash or round robin to assign to certain producer token (slot).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it overkill to use multiple-producers concurrent queue just for a memory pool usage. Is a single producer good enough to serve the usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Producer token can't be shared by two threads at the same time, so we will still need max(flush thread) + max(compaction thread) tokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next thing is, what if the thread lives longer than this _ctx_resources? the destruction of a thread local producer_token will cause invalid access to the _ctx_resources

Copy link
Contributor Author

@luohaha luohaha Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ctx_resources is part of CompressionContextPool, and all of CompressionContextPool are static and global. CompressionContextPool will always live longer than Ref.

Signed-off-by: luohaha <[email protected]>
kevincai
kevincai previously approved these changes Nov 25, 2024
@luohaha luohaha requested a review from wyb November 26, 2024 01:35
Comment on lines 119 to 121
if (producer_token == nullptr) {
producer_token = std::make_unique<::moodycamel::ProducerToken>(_ctx_resources);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not thread-safe? could change it to the following:
static thread_local ::moodycamel::ProducerToken producer_token(_ctx_resources);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not thread safe, producer_token is thread local.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's thrad-safe, just a refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: luohaha <[email protected]>
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 2 / 2 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/util/compression/compression_context_pool.h 2 2 100.00% []

@wyb wyb merged commit b141be8 into StarRocks:main Nov 27, 2024
51 checks passed
Copy link

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Nov 27, 2024
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Nov 27, 2024
Copy link

@Mergifyio backport branch-3.2

Copy link
Contributor

mergify bot commented Nov 27, 2024

backport branch-3.4

✅ Backports have been created

@github-actions github-actions bot removed the 3.2 label Nov 27, 2024
Copy link

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Nov 27, 2024
Copy link
Contributor

mergify bot commented Nov 27, 2024

backport branch-3.3

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 27, 2024

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 27, 2024

backport branch-3.1

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Nov 27, 2024
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024
wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024
wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024
wanpengfei-git pushed a commit that referenced this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants