Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-45054][SQL] HiveExternalCatalog.listPartitions should restore …
…partition statistics ### What changes were proposed in this pull request? Call `restorePartitionMetadata` in `listPartitions` to restore Spark SQL statistics. ### Why are the changes needed? Currently when `listPartitions` is called, it doesn't restore Spark SQL statistics stored in metastore, such as `spark.sql.statistics.totalSize`. This means callers who rely on stats from the method call may wrong results. In particular, when `spark.sql.statistics.size.autoUpdate.enabled` is turned on, during insert overwrite Spark will first list partitions and get old statistics, and then compare them with new statistics and see which partitions need to be updated. This issue will sometimes cause it to update all partitions instead of only those partitions that have been touched. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a new test. ### Was this patch authored or co-authored using generative AI tooling? Closes #42777 from sunchao/list-partition-stat. Authored-by: Chao Sun <[email protected]> Signed-off-by: Chao Sun <[email protected]>
- Loading branch information