Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49678][CORE] Support
spark.test.master
in `SparkSubmitArgume…
…nts` ### What changes were proposed in this pull request? This PR aims to support `spark.test.master` in `SparkSubmitArguments`. ### Why are the changes needed? To allow users to control the default master setting during testing and documentation generation. #### First, currently, we cannot build `Python Documentation` on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs. **BEFORE** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` **AFTER** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html ... build succeeded. The HTML pages are in build/html. ``` #### Second, in general, we can control all `SparkSubmit` (eg. Spark Shells) like the following. **BEFORE (`local[*]`)** ``` $ bin/pyspark Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1726519982935). SparkSession available as 'spark'. >>> ``` **AFTER (`local[1]`)** ``` $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[1], app id = local-1726519863363). SparkSession available as 'spark'. >>> ``` ### Does this PR introduce _any_ user-facing change? No. `spark.test.master` is a new parameter. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48126 from dongjoon-hyun/SPARK-49678. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information