Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix Passing an explicit array value to a Scala varargs method is deprecated #43642

Closed
wants to merge 8 commits into from

Conversation

ivoson
Copy link
Contributor

@ivoson ivoson commented Nov 3, 2023

What changes were proposed in this pull request?

Fix the deprecated behavior below:
Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call

For all the use cases, we don't need to make a copy of the array. Explicitly use ArraySeq.unsafeWrapArray to do the conversion.

Why are the changes needed?

Eliminate compile warnings and no longer use deprecated scala APIs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GA.
Fixed all the warning with build: mvn clean package -DskipTests -Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pyarn -Pkubernetes -Pkubernetes-integration-tests -Phive-thriftserver -Phadoop-cloud

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 3, 2023
@ivoson
Copy link
Contributor Author

ivoson commented Nov 3, 2023

cc @LuciferYang please take a look at this PR. Thanks.

@LuciferYang
Copy link
Contributor

Could you check again? IIRC, there should be more than 40+ files involved in this issue...

@LuciferYang
Copy link
Contributor

for example:

val flattenedPartition: UnrolledPartition = Array.concat(allPartitions: _*)

Array.concat(results: _*)

and

val options = mutable.Map(optionsList: _*)

@ivoson
Copy link
Contributor Author

ivoson commented Nov 3, 2023

Could you check again? IIRC, there should be more than 40+ files involved in this issue...

Thanks @LuciferYang ... checking

@LuciferYang
Copy link
Contributor

I will set this PR to draft first.

@LuciferYang LuciferYang marked this pull request as draft November 4, 2023 05:26
@ivoson ivoson marked this pull request as ready for review November 5, 2023 11:45
@ivoson
Copy link
Contributor Author

ivoson commented Nov 5, 2023

I will set this PR to draft first.

cc @LuciferYang , this PR is ready for review. Fixed all the warnnings with build command mvn clean package -DskipTests -Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pyarn -Pkubernetes -Pkubernetes-integration-tests -Phive-thriftserver -Phadoop-cloud.
This should cover all the modules.

@ivoson ivoson changed the title [SPARK-45687][SQL] Fix Passing an explicit array value to a Scala varargs method is deprecated [SPARK-45687][SQL][MLLIB][KUBERNETES][EXAMPLES][CONNECT] Fix Passing an explicit array value to a Scala varargs method is deprecated Nov 5, 2023
@LuciferYang
Copy link
Contributor

@ivoson You can temporarily add "-Wconf:cat=deprecation&msg=Passing an explicit array value to a Scala varargs method is deprecated:e" to compilerWarningSettings in SparkBuild.scala, and run the command build/sbt -Phadoop-3 -Pdocker-integration-tests -Pspark-ganglia-lgpl -Pkinesis-asl -Pkubernetes -Phive-thriftserver -Pconnect -Pyarn -Phive -Phadoop-cloud -Pvolcano -Pkubernetes-integration-tests Test/package streaming-kinesis-asl-assembly/assembly connect/assembly to ensure that nothing is overlooked.

@@ -254,7 +254,8 @@ object SparkBuild extends PomBuild {
// SPARK-45627 `enum`, `export` and `given` will become keywords in Scala 3,
// so they are prohibited from being used as variable names in Scala 2.13 to
// reduce the cost of migration in subsequent versions.
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e"
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e",
"-Wconf:cat=deprecation&msg=Passing an explicit array value to a Scala varargs method is deprecated:e"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, we may not need to add this new compile option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we keep it to avoid other folks adding the case again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not necessary, I'll remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we keep it to avoid other folks adding the case again?

This will cause some memory consumption and performance difference in collection copy, so it is indeed a problem. However, I prefer to wait for a while and see if the related cases increase rapidly again. If so, we can clean them up again and make it a stricter compile check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... I want to confirm again that in Scala 3, this is still just a compilation warning, right?

@LuciferYang
Copy link
Contributor

cc @srowen FYI

tmpModel.transform(df)
.withColumn(accColName, updateUDF(col(accColName), col(tmpRawPredName)))
.select(columns: _*)
.select(columns.toImmutableArraySeq: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cases in the example module, it is recommended to directly use toIndexedSeq or ArraySeq. unsafeWrapArra because ArrayImplicits is private[spark]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Do you mean the case in examples/src/main/scala/org/apache/spark/examples/graphx/Analytics.scala? Changed to use ArraySeq. unsafeWrapArray explicitly.

@srowen
Copy link
Member

srowen commented Nov 6, 2023

Looks OK but the PR description says you'll avoid a copy with unsafeWrapArray or other methods, but the change uses toImmutableArraySeq. That's fine but does it do the same thing?

@LuciferYang
Copy link
Contributor

LuciferYang commented Nov 6, 2023

The implementation of toImmutableArraySeq is as follows:

implicit class SparkArrayOps[T](xs: Array[T]) {
/**
* Wraps an Array[T] as an immutable.ArraySeq[T] without copying.
*/
def toImmutableArraySeq: immutable.ArraySeq[T] =
immutable.ArraySeq.unsafeWrapArray(xs)
}

This is the same thing

@github-actions github-actions bot removed the BUILD label Nov 7, 2023
Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM if test pass

@@ -51,7 +51,7 @@ object Analytics {
case _ => throw new IllegalArgumentException(s"Invalid argument: $arg")
}
}
val options = mutable.Map(optionsList: _*)
val options = mutable.Map(immutable.ArraySeq.unsafeWrapArray(optionsList): _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why don't we use org.apache.spark.util.ArrayImplicits here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, because examples do not depend on common/utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayImplicits is currently in the private[spark] scope, should we expose them in the examples code? Sorry, I'm not very clear about this rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, because examples do not depend on common/utils

common/utils is a transitive dependency of the core module, so ArrayImplicits is visible to the examples module, I previously suggested this only because I believe that private[spark] code should not be part of the examples. In this context, what is your suggestion? @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, because examples do not depend on common/utils

Just to clarify, the mllib-local module indeed does not depend on the common/utils module, so ArrayImplicits is not used in the mllib-local module.

Copy link
Contributor

@cloud-fan cloud-fan Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to make examples an exception and not use ArrayImplicits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your confirmation.

@ivoson ivoson changed the title [SPARK-45687][SQL][MLLIB][KUBERNETES][EXAMPLES][CONNECT] Fix Passing an explicit array value to a Scala varargs method is deprecated [SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix Passing an explicit array value to a Scala varargs method is deprecated Nov 8, 2023
@LuciferYang
Copy link
Contributor

friendly ping @srowen Could you take another look? Thanks ~

@srowen
Copy link
Member

srowen commented Nov 10, 2023

Merged to master

@srowen srowen closed this in 605aa0c Nov 10, 2023
@ivoson ivoson deleted the SPARK-45687 branch November 12, 2023 04:07
@ivoson
Copy link
Contributor Author

ivoson commented Nov 12, 2023

Thanks @LuciferYang @srowen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants