Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert small segment merger task in minions #14477

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Nov 18, 2024

PR related to the PEP request: #14305

Here, we are adding a new minion task to merge small segments in an upsert table. More implementation details in the design doc of the linked issue.

Test plan: Enabled this in one of infinite retention tables in Uber. The tables had ~35k segments initially and after enabling this task for ~2 days we were able to reach ~2k segments. The curve also flattens post reaching ~2k segments. We are using the default configs of this task and the table is generating ~500 segments daily. See attached screenshot.

Screenshot 2024-11-21 at 4 28 22 PM

Few details:

  • Unlike upsert compaction, we are enforcing enableSnapshot to be enabled for running this task. This makes sense from correctness perspective.
  • The new segment name formed example: compactmerged__table__5__1731976847331__0
  • The creation time of the new segment in ZK = max creation time in ZK of merged segments. Note the segment name doesn't have the index creation time value but the task running time.
  • The code merges LLC segment and UploadedRealtime segments which is desired.
  • Right now, we process one merging subtask per partition per task run. We will explore later of allowing multiple merging subtasks per partition per task run.

@tibrewalpratik17 tibrewalpratik17 added feature release-notes Referenced by PRs that need attention when compiling the next release notes upsert minion documentation labels Nov 18, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 11.37931% with 257 lines in your changes missing coverage. Please review.

Project coverage is 63.67%. Comparing base (59551e4) to head (64cd7d6).
Report is 1358 commits behind head on master.

Files with missing lines Patch % Lines
...tcompactmerge/UpsertCompactMergeTaskGenerator.java 12.95% 164 Missing and 4 partials ⚠️
...rtcompactmerge/UpsertCompactMergeTaskExecutor.java 0.00% 75 Missing ⚠️
...ctmerge/UpsertCompactMergeTaskExecutorFactory.java 0.00% 6 Missing ⚠️
...t/processing/framework/SegmentProcessorConfig.java 44.44% 4 Missing and 1 partial ⚠️
...UpsertCompactMergeTaskProgressObserverFactory.java 0.00% 2 Missing ⚠️
.../org/apache/pinot/core/common/MinionConstants.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14477      +/-   ##
============================================
+ Coverage     61.75%   63.67%   +1.92%     
- Complexity      207     1577    +1370     
============================================
  Files          2436     2671     +235     
  Lines        133233   146730   +13497     
  Branches      20636    22508    +1872     
============================================
+ Hits          82274    93434   +11160     
- Misses        44911    46376    +1465     
- Partials       6048     6920     +872     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.65% <11.37%> (+1.94%) ⬆️
java-21 63.53% <11.37%> (+1.91%) ⬆️
skip-bytebuffers-false 63.67% <11.37%> (+1.92%) ⬆️
skip-bytebuffers-true 63.51% <11.37%> (+35.78%) ⬆️
temurin 63.67% <11.37%> (+1.92%) ⬆️
unittests 63.67% <11.37%> (+1.92%) ⬆️
unittests1 55.56% <57.14%> (+8.67%) ⬆️
unittests2 34.01% <8.62%> (+6.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@tibrewalpratik17 tibrewalpratik17 force-pushed the upsert_compact_merge branch 2 times, most recently from 09ecdfe to dc1df38 Compare November 19, 2024 12:53
@tibrewalpratik17
Copy link
Contributor Author

Marking it ready for review for early feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature minion release-notes Referenced by PRs that need attention when compiling the next release notes upsert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants