Skip to content

Commit

Permalink
[SPARK-49678][CORE] Support spark.test.master in `SparkSubmitArgume…
Browse files Browse the repository at this point in the history
…nts`

### What changes were proposed in this pull request?

This PR aims to support `spark.test.master` in `SparkSubmitArguments`.

### Why are the changes needed?

To allow users to control the default master setting during testing and documentation generation.

#### First, currently, we cannot build `Python Documentation` on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs.

**BEFORE**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ make html
...
java.lang.OutOfMemoryError: Java heap space
...
24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker
...
make: *** [html] Error 2
```

**AFTER**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html
...
build succeeded.

The HTML pages are in build/html.
```

#### Second, in general, we can control all `SparkSubmit` (eg. Spark Shells) like the following.

**BEFORE (`local[*]`)**
```
$ bin/pyspark
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1726519982935).
SparkSession available as 'spark'.
>>>
```

**AFTER (`local[1]`)**
```
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[1], app id = local-1726519863363).
SparkSession available as 'spark'.
>>>
```

### Does this PR introduce _any_ user-facing change?

No. `spark.test.master` is a new parameter.

### How was this patch tested?

Manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48126 from dongjoon-hyun/SPARK-49678.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun authored and attilapiros committed Oct 4, 2024
1 parent c698f6e commit 1564672
Showing 1 changed file with 2 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
extends SparkSubmitArgumentsParser with Logging {
var maybeMaster: Option[String] = None
// Global defaults. These should be keep to minimum to avoid confusing behavior.
def master: String = maybeMaster.getOrElse("local[*]")
def master: String =
maybeMaster.getOrElse(System.getProperty("spark.test.master", "local[*]"))
var maybeRemote: Option[String] = None
var deployMode: String = null
var executorMemory: String = null
Expand Down

0 comments on commit 1564672

Please sign in to comment.