Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49413][CONNECT][SQL] Create a shared RuntimeConfig interface #47980

Closed
wants to merge 9 commits into from

Conversation

hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces a shared RuntimeConfig interface.

Why are the changes needed?

We are creating a shared Scala Spark SQL interface for Classic and Connect.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@hvanhovell
Copy link
Contributor Author

Merging to master.

@asfgit asfgit closed this in 6fc176f Sep 17, 2024
import org.apache.spark.sql.connect.client.SparkConnectClient

/**
* Runtime configuration interface for Spark. To access this, use `SparkSession.conf`.
*
* @since 3.4.0
*/
class RuntimeConfig private[sql] (client: SparkConnectClient) extends Logging {
class ConnectRuntimeConfig private[sql] (client: SparkConnectClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a breaking change at API level.

  • I'm wondering if this is okay, @hvanhovell ?
  • In addition, please fix @since 3.4.0 tag because ConnectRuntimeConfig only exists after 4.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not part of the public API. It is in internal, which is not public API. Regarding the tag I am not sure, that class has existed since 3.4.0. It just got renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -160,6 +160,10 @@ object MimaExcludes {
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.DataFrameWriterV2"),
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.WriteConfigMethods"),

// SPARK-49413: Create a shared RuntimeConfig interface.
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.RuntimeConfig"),
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.RuntimeConfig$"),
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this looks like a real breaking change indeed from Spark SQL part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not. The org.apache.spark.sql.RuntimeConfig class is still there. It was just moved. MiMa - unfortunately - cannot deal with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, got it. So, there is no breaking change effectively because it's in api module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here as a follow-up, please, @hvanhovell ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will create a follow-up.

@@ -1110,13 +1103,13 @@ object SparkSession extends Logging {
private[sql] def getOrCloneSessionWithConfigsOff(
session: SparkSession,
configurations: Seq[ConfigEntry[Boolean]]): SparkSession = {
val configsEnabled = configurations.filter(session.conf.get[Boolean])
val configsEnabled = configurations.filter(session.sessionState.conf.getConf[Boolean])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is not a breaking change, do we still need this change inevitably for some reasons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If then, the developers seem to be annoyed by this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I use the abstract class in sql/api the main interface. I did not really wanted to add support for this because it is not user facing, and because it requires me to move even more stuff into sql/api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which developers? Spark developers?

@@ -52,7 +52,7 @@ class FileBasedDataSourceSuite extends QueryTest

override def beforeAll(): Unit = {
super.beforeAll()
spark.conf.set(SQLConf.ORC_IMPLEMENTATION, "native")
spark.conf.set(SQLConf.ORC_IMPLEMENTATION.key, "native")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a required user-facing change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the method that takes a sql conf entry is gone now... This was not a user facing feature to begin with.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 17, 2024

Thank you for making this direction, @hvanhovell . In general it looks like achieving most goals, but the following changes looks like a big user (Spark app developers) facing glitch to me. If possible, can we make Apache Spark 4.0.0 migration more seamlessly by supporting back in the same way?

spark.conf.get
spark.conf.set

@dongjoon-hyun
Copy link
Member

If this is difficult, could you file a JIRA issue instead? Then, if it's required by someone, he might pick it up.

it requires me to move even more stuff into sql/api.

@hvanhovell
Copy link
Contributor Author

@dongjoon-hyun which developers are you talking about? I have fixed all the spark cases. Generally library developers should not really be using any of this, if they don't want their stuff to break.

@hvanhovell
Copy link
Contributor Author

If this is difficult, could you file a JIRA issue instead? Then, if it's required by someone, he might pick it up.

It is not difficult, it is just yet another thing that needs to be moved to common/utils.

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
This PR introduces a shared RuntimeConfig interface.

### Why are the changes needed?
We are creating a shared Scala Spark SQL interface for Classic and Connect.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47980 from hvanhovell/SPARK-49413.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?
This PR introduces a shared RuntimeConfig interface.

### Why are the changes needed?
We are creating a shared Scala Spark SQL interface for Classic and Connect.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47980 from hvanhovell/SPARK-49413.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
asfgit pushed a commit that referenced this pull request Dec 4, 2024
…interface

### What changes were proposed in this pull request?
This PR adds support for ConfigEntry to the RuntimeConfig interface. This was removed in #47980.

### Why are the changes needed?
This functionality is used a lot by Spark libraries. Removing them caused friction, and adding them does not pollute the RuntimeConfig interface.

### Does this PR introduce _any_ user-facing change?
No. This is developer API.

### How was this patch tested?
I have added tests cases for Connect and Classic.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49062 from hvanhovell/SPARK-49709.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants