Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] avoid sync listPartitionNames when query iceberg & mv #53168

Merged
merged 20 commits into from
Nov 28, 2024

Conversation

dirtysalt
Copy link
Contributor

@dirtysalt dirtysalt commented Nov 25, 2024

Why I'm doing:

listPartitionNames is a heavy operation on iceberg table. And in normal query process, we don't call that.

If we have MV, we will call that function in CBO stage. But we don't want waiting if there is no cache.

We can just return a null(to tell CBO don't do MV rewrite) if there is no cache, and return cache value if cached. So there is no sync operation during query.

What I'm doing:

This PR is to add:

  • queryMVRerwrite in connector metadata request context and connector traits
  • set queryMVRerwrite to true when doing query MV rewrite
  • during listPartitionNames, if query mv rewrite is true and there is no cache value, return null.
  • add a session varibale ENABLE_CONNECTOR_ASYNC_LIST_PARTITIONS to enable this feature(off by default)

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@dirtysalt dirtysalt requested a review from a team as a code owner November 25, 2024 06:04
@github-actions github-actions bot added the 3.4 label Nov 25, 2024
Signed-off-by: yanz <[email protected]>
@dirtysalt dirtysalt requested a review from a team as a code owner November 25, 2024 06:08
@dirtysalt dirtysalt enabled auto-merge (squash) November 25, 2024 07:12
Youngwb
Youngwb previously approved these changes Nov 25, 2024
Signed-off-by: yanz <[email protected]>
@dirtysalt dirtysalt requested review from a team as code owners November 25, 2024 12:42
@eshishki
Copy link
Contributor

users would expect their mv to work
we build mv for a reason, to accelerate slow queries

this patch will make performance and response time inconsistent
why you think it is better to execute original slow query than to wait for catalog response?

@dirtysalt
Copy link
Contributor Author

users would expect their mv to work we build mv for a reason, to accelerate slow queries

this patch will make performance and response time inconsistent why you think it is better to execute original slow query than to wait for catalog response?

@eshishki

Because we have seen so many cases that

  • iceberg table is huge and takes a very longe time to get partitions.
  • iceberg update speed is much faster than mv refresh speed.

So everytime there is a mv & iceberg query, at CBO stage, it takes time to build partitions for mv rewrite, which adds latency. Remember it fails to rewrite query to use MV, it still takes time to build partitions.

So there is a tradeoff between
a. wait for catalog response for partitions(T0) + MV execution time(T1)
b. wait for catalog response partitions(T0) + iceberg execution time(T2)
c. iceberg execution time(T2)

And we have seen many cases fall into (b) case.

Seaven
Seaven previously approved these changes Nov 27, 2024
stephen-shelby
stephen-shelby previously approved these changes Nov 27, 2024
Youngwb
Youngwb previously approved these changes Nov 27, 2024
@eshishki
Copy link
Contributor

eshishki commented Nov 27, 2024

@dirtysalt i understand that it might be useful in some cases, but can we do a session variable for this or better yet configurable for mv?

that way i can make this tradeoff when i need to

edit: ok, i see enable_connector_async_list_partitions now
but please think about per mv option

@dirtysalt
Copy link
Contributor Author

@eshishki thanks for your proposal. And after second thought, I think you are right.

And I've added a session variable to control this behaviour

add a session varibale enable_connector_async_list_partitions to enable this feature(off by default)

So by default it's false, and it's behaves like before. And if it's set to true, then it won't wait if parttion names not cached.

@dirtysalt i understand that it might be useful in some cases, but can we do a session variable for this or better yet configurable for mv?

that way i can make this tradeoff when i need to

@dirtysalt
Copy link
Contributor Author

@eshishki what you mean per mv option? I don't quite understand how mv option can affect this process?

@eshishki
Copy link
Contributor

@dirtysalt regular analyst user or some tableau dashboard might not know about this nuances, mv, and other optimisations and can't change session variable before each query

mv are usually created by dba or etl engineer who are best suited to make this tradeoff
i see now that in ALTER MATERIALIZED VIEW i can put session property and that is exactly what i would need

Copy link

sonarcloud bot commented Nov 27, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 90 / 98 (91.84%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/common/proc/HMSTablePartitionsProcDir.java 0 1 00.00% [59]
🔵 com/starrocks/planner/PaimonScanNode.java 0 1 00.00% [327]
🔵 com/starrocks/connector/iceberg/IcebergCatalog.java 13 16 81.25% [187, 198, 238]
🔵 com/starrocks/catalog/IcebergTable.java 11 13 84.62% [298, 301]
🔵 com/starrocks/connector/iceberg/IcebergMetadata.java 7 8 87.50% [329]
🔵 com/starrocks/catalog/MaterializedView.java 3 3 100.00% []
🔵 com/starrocks/qe/SessionVariable.java 4 4 100.00% []
🔵 com/starrocks/connector/iceberg/CachingIcebergCatalog.java 29 29 100.00% []
🔵 com/starrocks/server/IcebergTableFactory.java 2 2 100.00% []
🔵 com/starrocks/connector/partitiontraits/DefaultTraits.java 4 4 100.00% []
🔵 com/starrocks/qe/ConnectContext.java 2 2 100.00% []
🔵 com/starrocks/connector/iceberg/IcebergApiConverter.java 2 2 100.00% []
🔵 com/starrocks/connector/ConnectorMetadatRequestContext.java 5 5 100.00% []
🔵 com/starrocks/connector/partitiontraits/IcebergPartitionTraits.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/rewrite/OptExternalPartitionPruner.java 1 1 100.00% []
🔵 com/starrocks/sql/analyzer/AnalyzeStmtAnalyzer.java 1 1 100.00% []
🔵 com/starrocks/planner/DeltaLakeScanNode.java 1 1 100.00% []
🔵 com/starrocks/connector/ConnectorPartitionTraits.java 4 4 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@kangkaisen kangkaisen merged commit ca1c066 into StarRocks:main Nov 28, 2024
45 of 47 checks passed
Copy link

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Nov 28, 2024
Copy link
Contributor

mergify bot commented Nov 28, 2024

backport branch-3.4

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Nov 28, 2024
@dirtysalt dirtysalt deleted the mv-iceberg-opt-2 branch November 28, 2024 01:54
@dirtysalt
Copy link
Contributor Author

@eshishki

I've merged this PR in advance. But I agree with your point. this flag is better to be put as a mv property not a session variable.

wanpengfei-git pushed a commit that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants