-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49678][CORE] Support spark.test.master
in SparkSubmitArguments
#48126
Conversation
Could you review this PR, @viirya ? |
Thank you, @viirya ! |
Merged to master. |
@@ -43,7 +43,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
extends SparkSubmitArgumentsParser with Logging { | |||
var maybeMaster: Option[String] = None | |||
// Global defaults. These should be keep to minimum to avoid confusing behavior. | |||
def master: String = maybeMaster.getOrElse("local[*]") | |||
def master: String = | |||
maybeMaster.getOrElse(System.getProperty("spark.test.master", "local[*]")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is fine but just curious. Why can't we just set spark.master
instead? The Python documentation build actually works fine in my mac fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can change it to accept spark.master
here.
Why can't we just set spark.master instead?
For the following questions, I guess you already have #48129 to limit the number of SparkSubmit. It mitigate the situation. This PR and #48129 aims to two layers (Spark Submission and Spark local executor cores) during Python documentation generation. This PR is still valid for many core machines like c6i.24xlarge
.
The Python documentation build actually works fine in my mac fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the reason why I didn't use spark.master
directly is that it could have some side effects in K8s side. Let me check that part to make ti sure, @HyukjinKwon .
I made a follow-up to address the comments, @HyukjinKwon . After looking at the code, it seems to okay. Currently, the follow-up PR is running the CIs. If CI passes, it will be okay. Thank you for the suggestion. |
To @HyukjinKwon , I closed the above follow-up and kept this original PR. For the comment (#48134 (comment)),
This PR has more contributions on top of Python Doc Generations. For example,
|
Sure that's fine! Thanks for taking a look. |
Thank you. I'll this as a test setting for now~ We can revisit this later. |
…nts` ### What changes were proposed in this pull request? This PR aims to support `spark.test.master` in `SparkSubmitArguments`. ### Why are the changes needed? To allow users to control the default master setting during testing and documentation generation. #### First, currently, we cannot build `Python Documentation` on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs. **BEFORE** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` **AFTER** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html ... build succeeded. The HTML pages are in build/html. ``` #### Second, in general, we can control all `SparkSubmit` (eg. Spark Shells) like the following. **BEFORE (`local[*]`)** ``` $ bin/pyspark Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1726519982935). SparkSession available as 'spark'. >>> ``` **AFTER (`local[1]`)** ``` $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[1], app id = local-1726519863363). SparkSession available as 'spark'. >>> ``` ### Does this PR introduce _any_ user-facing change? No. `spark.test.master` is a new parameter. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48126 from dongjoon-hyun/SPARK-49678. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…nts` ### What changes were proposed in this pull request? This PR aims to support `spark.test.master` in `SparkSubmitArguments`. ### Why are the changes needed? To allow users to control the default master setting during testing and documentation generation. #### First, currently, we cannot build `Python Documentation` on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs. **BEFORE** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` **AFTER** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html ... build succeeded. The HTML pages are in build/html. ``` #### Second, in general, we can control all `SparkSubmit` (eg. Spark Shells) like the following. **BEFORE (`local[*]`)** ``` $ bin/pyspark Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1726519982935). SparkSession available as 'spark'. >>> ``` **AFTER (`local[1]`)** ``` $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[1], app id = local-1726519863363). SparkSession available as 'spark'. >>> ``` ### Does this PR introduce _any_ user-facing change? No. `spark.test.master` is a new parameter. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48126 from dongjoon-hyun/SPARK-49678. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to support
spark.test.master
inSparkSubmitArguments
.Why are the changes needed?
To allow users to control the default master setting during testing and documentation generation.
First, currently, we cannot build
Python Documentation
on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs.BEFORE
AFTER
Second, in general, we can control all
SparkSubmit
(eg. Spark Shells) like the following.BEFORE (
local[*]
)AFTER (
local[1]
)Does this PR introduce any user-facing change?
No.
spark.test.master
is a new parameter.How was this patch tested?
Manual tests.
Was this patch authored or co-authored using generative AI tooling?
No.