Skip to content

Commit

Permalink
[SPARK-49414][CONNECT][SQL] Add Shared DataFrameReader interface
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
This PR creates a shared interface for DataFrameReader.

### Why are the changes needed?
We are creating a shared Scala Spark SQL interface for Classic and Connect.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47975 from hvanhovell/SPARK-49414.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
  • Loading branch information
hvanhovell committed Sep 5, 2024
1 parent 75e53b7 commit e76c6c9
Show file tree
Hide file tree
Showing 8 changed files with 758 additions and 974 deletions.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -214,16 +214,7 @@ class SparkSession private[sql] (
sql(query, Array.empty)
}

/**
* Returns a [[DataFrameReader]] that can be used to read non-streaming data in as a
* `DataFrame`.
* {{{
* sparkSession.read.parquet("/path/to/file.parquet")
* sparkSession.read.schema(schema).json("/path/to/file.json")
* }}}
*
* @since 3.4.0
*/
/** @inheritdoc */
def read: DataFrameReader = new DataFrameReader(this)

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,14 @@ object CheckConnectJvmClientCompatibility {
ProblemFilters.exclude[DirectMissingMethodProblem](
"org.apache.spark.sql.UDFRegistration.initializeLogIfNecessary$default$2"),

// Protected DataFrameReader methods...
ProblemFilters.exclude[DirectMissingMethodProblem](
"org.apache.spark.sql.DataFrameReader.validateSingleVariantColumn"),
ProblemFilters.exclude[DirectMissingMethodProblem](
"org.apache.spark.sql.DataFrameReader.validateJsonSchema"),
ProblemFilters.exclude[DirectMissingMethodProblem](
"org.apache.spark.sql.DataFrameReader.validateXmlSchema"),

// Datasource V2 partition transforms
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.PartitionTransform"),
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.PartitionTransform$"),
Expand Down
20 changes: 20 additions & 0 deletions project/MimaExcludes.scala
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,26 @@ object MimaExcludes {
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.Observation"),
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.Observation$"),

// SPARK-49414: Remove Logging from DataFrameReader.
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.sql.DataFrameReader"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logName"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.log"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logInfo"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logDebug"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logTrace"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logWarning"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logError"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logInfo"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logDebug"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logTrace"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logWarning"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.logError"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.isTraceEnabled"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.initializeLogIfNecessary"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.initializeLogIfNecessary"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.initializeLogIfNecessary$default$2"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.DataFrameReader.initializeForcefully"),

// SPARK-49425: Create a shared DataFrameWriter interface.
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.DataFrameWriter"),

Expand Down
Loading

0 comments on commit e76c6c9

Please sign in to comment.