-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] avoid sync listPartitionNames
when query iceberg & mv
#53168
Conversation
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
fe/fe-core/src/main/java/com/starrocks/connector/ConnectorMetadatRequestContext.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/connector/ConnectorPartitionTraits.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/connector/iceberg/CachingIcebergCatalog.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/connector/iceberg/CachingIcebergCatalog.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/connector/ConnectorMetadatRequestContext.java
Show resolved
Hide resolved
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
fe/fe-core/src/test/java/com/starrocks/connector/hive/ReplayMetadataMgr.java
Show resolved
Hide resolved
Signed-off-by: yanz <[email protected]>
Signed-off-by: yanz <[email protected]>
users would expect their mv to work this patch will make performance and response time inconsistent |
Because we have seen so many cases that
So everytime there is a mv & iceberg query, at CBO stage, it takes time to build partitions for mv rewrite, which adds latency. Remember it fails to rewrite query to use MV, it still takes time to build partitions. So there is a tradeoff between And we have seen many cases fall into (b) case. |
fe/fe-core/src/main/java/com/starrocks/connector/iceberg/CachingIcebergCatalog.java
Show resolved
Hide resolved
@dirtysalt i understand that it might be useful in some cases, but can we do a session variable for this or better yet configurable for mv? that way i can make this tradeoff when i need to edit: ok, i see enable_connector_async_list_partitions now |
@eshishki thanks for your proposal. And after second thought, I think you are right. And I've added a session variable to control this behaviour
So by default it's false, and it's behaves like before. And if it's set to true, then it won't wait if parttion names not cached.
|
@eshishki what you mean |
@dirtysalt regular analyst user or some tableau dashboard might not know about this nuances, mv, and other optimisations and can't change session variable before each query mv are usually created by dba or etl engineer who are best suited to make this tradeoff |
Signed-off-by: yanz <[email protected]>
1a4815c
Quality Gate failedFailed conditions See analysis details on SonarQube Cloud Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 90 / 98 (91.84%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@Mergifyio backport branch-3.4 |
✅ Backports have been created
|
…#53168) Signed-off-by: yanz <[email protected]> (cherry picked from commit ca1c066)
I've merged this PR in advance. But I agree with your point. this flag is better to be put as a mv property not a session variable. |
… (backport #53168) (#53288) Co-authored-by: RyanZ <[email protected]>
Why I'm doing:
listPartitionNames
is a heavy operation on iceberg table. And in normal query process, we don't call that.If we have MV, we will call that function in CBO stage. But we don't want waiting if there is no cache.
We can just return a null(to tell CBO don't do MV rewrite) if there is no cache, and return cache value if cached. So there is no sync operation during query.
What I'm doing:
This PR is to add:
queryMVRerwrite
in connector metadata request context and connector traitsqueryMVRerwrite
to true when doing query MV rewritelistPartitionNames
, if query mv rewrite is true and there is no cache value, return null.ENABLE_CONNECTOR_ASYNC_LIST_PARTITIONS
to enable this feature(off by default)Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: