-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] add partition scan infomation in audit log #51853
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]❌ fail : 32 / 51 (62.75%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@kevincai Hi Could you please help to review the pr when you have free time ? |
I may not be adequate to review this PR. But I am kind of objection to add scanPartitions into audit log, (the query dump might be fine), it can be easily blow the audit log by a single full table scanning of thousands of partitions. |
From the view of a cluster administrator, the partitions scan info is needed when faced the Insufficient storage or metadata pressure(tablet too many),it would become a reference to data governance. And consider the metadata pressure , it would not create the table with too many partition . From the 3.3.2 , it looks provide a param |
There are alternative solutions, may not necessary to get it through audit log. e.g. through '/statistics' interface and thrilling down to db-table-partition level. If it is not there, it is good to enhance it. |
For detailed data governance, may be it is good for partitions scan info in the level of single sql. To avoid to make the audit log file too large , add a new FE variable to limit the number of the printed scan partitions , exceed the limit would display like |
Would be good to have others thought on this PR, not just mine. |
Why I'm doing:
When user set a long TTL for table partitions, but the historic partitions are never visited or rarely visited, it would cause unnecessary storage waste and metadata pressure.
So more detailed partition visit audit information is needed to distinguish cold and hot partitions, so as to better data governance.
With the help of external tools, it is easy to parse the source table from query SQL, but it is difficult to parse the visit partition information.
What I'm doing:
Add a new FE session variable
enable_scan_partitions_audit
to control turn on or not, the fe.audit.log file would add a new item ScanPartitions after ScanRows like:ScanPartitions="[{catalogName:default_catalog,databaseName:db1,tableName:tab1,partitionIds:[p20241001, p20241002]}, {catalogName:default_catalog,databaseName:db2,tableName:tab2,partitionIds:[tab2]}]"
explain as follows:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: