Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add partition scan infomation in audit log #51853

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

MatthewH00
Copy link
Contributor

@MatthewH00 MatthewH00 commented Oct 13, 2024

Why I'm doing:

When user set a long TTL for table partitions, but the historic partitions are never visited or rarely visited, it would cause unnecessary storage waste and metadata pressure.
So more detailed partition visit audit information is needed to distinguish cold and hot partitions, so as to better data governance.
With the help of external tools, it is easy to parse the source table from query SQL, but it is difficult to parse the visit partition information.

What I'm doing:

Add a new FE session variable enable_scan_partitions_audit to control turn on or not, the fe.audit.log file would add a new item ScanPartitions after ScanRows like:

ScanPartitions="[{catalogName:default_catalog,databaseName:db1,tableName:tab1,partitionIds:[p20241001, p20241002]}, {catalogName:default_catalog,databaseName:db2,tableName:tab2,partitionIds:[tab2]}]"
explain as follows:

- catalogName: only support internal catalog now

- databaseName: database name

- tableName: table name

- partitionIds: when is partitioned table display visit partitionId, when is unpartitioned table display table name

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Copy link

sonarcloud bot commented Oct 17, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

fail : 32 / 51 (62.75%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/planner/PlanNode.java 0 1 00.00% [503]
🔵 com/starrocks/qe/StmtExecutor.java 6 20 30.00% [1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1215, 1216, 1221, 1294]
🔵 com/starrocks/qe/SessionVariable.java 2 4 50.00% [2631, 2632]
🔵 com/starrocks/qe/QueryDetail.java 4 5 80.00% [285]
🔵 com/starrocks/sql/common/MetaUtils.java 7 8 87.50% [102]
🔵 com/starrocks/plugin/AuditEvent.java 3 3 100.00% []
🔵 com/starrocks/planner/ScanNode.java 1 1 100.00% []
🔵 com/starrocks/common/util/AuditStatisticsUtil.java 3 3 100.00% []
🔵 com/starrocks/http/HttpResultSender.java 1 1 100.00% []
🔵 com/starrocks/qe/ConnectProcessor.java 1 1 100.00% []
🔵 com/starrocks/http/JsonSerializer.java 4 4 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@MatthewH00
Copy link
Contributor Author

@kevincai Hi Could you please help to review the pr when you have free time ?

@kevincai
Copy link
Contributor

kevincai commented Oct 18, 2024

@kevincai Hi Could you please help to review the pr when you have free time ?

I may not be adequate to review this PR. But I am kind of objection to add scanPartitions into audit log, (the query dump might be fine), it can be easily blow the audit log by a single full table scanning of thousands of partitions.

@MatthewH00
Copy link
Contributor Author

@kevincai Hi Could you please help to review the pr when you have free time ?

I may not be adequate to review this PR. But I am kind of objection to add scanPartitions into audit log, (the query dump might be fine), it can be easily blow the audit log by a single full table scanning of thousands of partitions.

From the view of a cluster administrator, the partitions scan info is needed when faced the Insufficient storage or metadata pressure(tablet too many),it would become a reference to data governance. And consider the metadata pressure , it would not create the table with too many partition . From the 3.3.2 , it looks provide a param max_partition_num_per_table
to limit the partition number of table

@kevincai
Copy link
Contributor

@kevincai Hi Could you please help to review the pr when you have free time ?

I may not be adequate to review this PR. But I am kind of objection to add scanPartitions into audit log, (the query dump might be fine), it can be easily blow the audit log by a single full table scanning of thousands of partitions.

From the view of a cluster administrator, the partitions scan info is needed when faced the Insufficient storage or metadata pressure(tablet too many),it would become a reference to data governance. And consider the metadata pressure , it would not create the table with too many partition . From the 3.3.2 , it looks provide a param max_partition_num_per_table to limit the partition number of table

There are alternative solutions, may not necessary to get it through audit log. e.g. through '/statistics' interface and thrilling down to db-table-partition level. If it is not there, it is good to enhance it.

@MatthewH00
Copy link
Contributor Author

@kevincai Hi Could you please help to review the pr when you have free time ?

I may not be adequate to review this PR. But I am kind of objection to add scanPartitions into audit log, (the query dump might be fine), it can be easily blow the audit log by a single full table scanning of thousands of partitions.

From the view of a cluster administrator, the partitions scan info is needed when faced the Insufficient storage or metadata pressure(tablet too many),it would become a reference to data governance. And consider the metadata pressure , it would not create the table with too many partition . From the 3.3.2 , it looks provide a param max_partition_num_per_table to limit the partition number of table

There are alternative solutions, may not necessary to get it through audit log. e.g. through '/statistics' interface and thrilling down to db-table-partition level. If it is not there, it is good to enhance it.

For detailed data governance, may be it is good for partitions scan info in the level of single sql. To avoid to make the audit log file too large , add a new FE variable to limit the number of the printed scan partitions , exceed the limit would display like partitionIds:[p20241001,...,p20241030]} it would be ok?

@kevincai
Copy link
Contributor

kevincai commented Oct 18, 2024

For detailed data governance, may be it is good for partitions scan info in the level of single sql. To avoid to make the audit log file too large , add a new FE variable to limit the number of the printed scan partitions , exceed the limit would display like partitionIds:[p20241001,...,p20241030]} it would be ok?

Would be good to have others thought on this PR, not just mine.

@github-actions github-actions bot added the 3.3 label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants