Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50285] Metrics for commits to StagedTable instances #48830

Closed

Conversation

olaky
Copy link
Contributor

@olaky olaky commented Nov 12, 2024

What changes were proposed in this pull request?

Commands that commit using the StagedTable interface do not have any metrics at the moment because the interface does not support retrieving metrics after a commit. This PR

  • Adds a new interface StagedTableWithCommitMetrics that allows to retrieve metrics
  • Add a method StagingTableCatalog to indicate that the catalog supports metrics
  • Support metric retrieval in the commands that use the StagedTable interface

Why are the changes needed?

Many create table commands currently return no metrics at all

Does this PR introduce any user-facing change?

No

How was this patch tested?

New tests with a test catalog for the affected commands

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 12, 2024
@olaky
Copy link
Contributor Author

olaky commented Nov 14, 2024

cc @cloud-fan Can you take a look? I saw you originally implemented the Staged Table interface

@olaky
Copy link
Contributor Author

olaky commented Nov 16, 2024

FYI @manuzhang this PR intoroduces analogous interfaces to the ones you added in Write for staged tables

@olaky olaky requested a review from cloud-fan November 25, 2024 21:31
@@ -148,6 +150,8 @@ case class ReplaceTableAsSelectExec(

val properties = CatalogV2Util.convertTableProperties(tableSpec)

override val metrics: Map[String, SQLMetric] = commitMetrics(catalog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the atomic version of RTAS, does it have commit metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this probably doesn't matter, as we return Map.empty if the catalog is not staged.

Then maybe we can move this to V2CreateTableAsSelectBaseExec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2CreateTableAsSelectBaseExec does not have access to catalog. But I changed the signature of commitMetrics() to only accept StagedTableCatalog and removed it here, that looks cleaner

@olaky olaky requested a review from cloud-fan November 26, 2024 18:39
@@ -631,7 +641,18 @@ private[v2] trait V2CreateTableAsSelectBaseExec extends LeafV2CommandExec {
qe.assertCommandExecuted()

table match {
case st: StagedTable => st.commitStagedChanges()
case st: StagedTable =>
st.commitStagedChanges()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a def commitStagedTable in DataSourceV2Utils, which does the commit work and also reports driver metrics. Then we can reuse this method in both AtomicReplaceTableExec and here.

@@ -612,6 +616,12 @@ case class DeltaWithMetadataWritingSparkTask(
private[v2] trait V2CreateTableAsSelectBaseExec extends LeafV2CommandExec {
override def output: Seq[Attribute] = Nil

protected def commitMetrics(tableCatalog: StagingTableCatalog): Map[String, SQLMetric] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this to DataSourceV2Utils as well so that we can reuse it in ReplaceTableExec?

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants