Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Skip rebalancing scan ranges for hdfs backend selector by default when using datacache. #51996

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

GavinMar
Copy link
Contributor

@GavinMar GavinMar commented Oct 16, 2024

Why I'm doing:

Now we use consistent hash algorithm to select backend for hdfs scan ranges, which cannot make sure the scan ranges will be evenly distributed among all backends. So, we rebalance the scan range from one backend to another one if the data distribution on the former exceeds 10% of the average bytes.

However, this may cause random cache miss because the same scan range may be rebalanced to a different one. So, even if the same query is executed multiple times, it still cannot fully hit the cache each time. This will lead to significant performance degradation in many scenarios.

What I'm doing:

Considering with the help of so many virtual nodes, consistent hashing usually does not result in significant deviations in data distribution. So, we skip rebalancing scan ranges by default when using datacache.

Also, we add a session variable to change this default behavior in some special cases.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

…by default when using datacache.

Signed-off-by: GavinMar <[email protected]>
Copy link

sonarcloud bot commented Oct 16, 2024

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 8 / 10 (80.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/qe/SessionVariable.java 2 4 50.00% [2736, 2737]
🔵 com/starrocks/qe/HDFSBackendSelector.java 6 6 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

boolean enableDataCache = ConnectContext.get() != null ? ConnectContext.get().getSessionVariable().
isEnableScanDataCache() : false;
// If force-rebalancing is not specified and cache is used, skip the rebalancing directly.
if (!forceReBalance && enableDataCache) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need add user guide that only when cache enabled, forceReBalance does it's work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need add user guide that only when cache enabled, forceReBalance does it's work.

Ok,set it invisible currently, and if it is necessary to expose it to users, we will add relevant documentation to explain it

@Youngwb Youngwb merged commit fe00c0b into StarRocks:main Oct 18, 2024
69 of 70 checks passed
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Oct 18, 2024
Copy link
Contributor

mergify bot commented Oct 18, 2024

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 18, 2024
…by default when using datacache. (#51996)

Signed-off-by: GavinMar <[email protected]>
(cherry picked from commit fe00c0b)
wanpengfei-git pushed a commit that referenced this pull request Oct 18, 2024
…by default when using datacache. (backport #51996) (#52072)

Co-authored-by: Gavin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants