[SPARK-50386][SQL] Improve SparkFatalException Propagation when OutOfMemoryError occurs on BroadcastExchangeExec building small table to broadcast #48925

erenavsarogullari · 2024-11-21T22:24:05Z

What changes were proposed in this pull request?

When BroadcastHashJoin builds small table to broadcast by BroadcastExchangeExec, if OutOfMemoryError has occurred on driver, BroadcastExchangeExec throws SparkFatalException which wraps SparkException. However, SparkException' s cause property may come as null by missing actual cause which is java.lang.OutOfMemoryError: Java heap space in following example. Actual cause info is also useful to propagate Throwable.getCause / getCause.* properties to clients. Repro test case has been added.

Before Fix:

org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
java.util.concurrent.ExecutionException: org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:255)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeBroadcastBcast$1(SparkPlan.scala:204)
	at scala.util.Try$.apply(Try.scala:217)
	at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1375)
	at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1429)
	at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:200)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:259)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:256)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:196)
	at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:377)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:458)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:427)
	... 400 more
Caused by: org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:230)
	... 8 more
Caused by: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.notEnoughMemoryToBuildAndBroadcastTableError(QueryExecutionErrors.scala:2070)
	... 10 more

After Fix:

org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
java.util.concurrent.ExecutionException: org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:255)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeBroadcastBcast$1(SparkPlan.scala:204)
	at scala.util.Try$.apply(Try.scala:217)
	at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1375)
	at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1429)
	at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:200)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:259)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:256)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:196)
	at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:377)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:458)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:427)
    ... 400 more
Caused by: org.apache.spark.util.SparkFatalException: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:230)
	... 8 more
Caused by: org.apache.spark.SparkException: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.notEnoughMemoryToBuildAndBroadcastTableError(QueryExecutionErrors.scala:2070)
	... 10 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:221)
	... 9 more

Why are the changes needed?

SparkException' s cause property may come as null by missing actual cause which is java.lang.OutOfMemoryError: Java heap space in above example. Actual cause info is also useful to propagate Throwable.getCause / getCause.* properties to clients.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new 2 UTs.

Was this patch authored or co-authored using generative AI tooling?

No

…yError occurs on BroadcastExchangeExec building small table to broadcast

github-actions bot added the SQL label Nov 21, 2024

erenavsarogullari force-pushed the SPARK-50386 branch from d6d1653 to d4c1f9a Compare November 21, 2024 22:32

SPARK-50386 - Improve SparkFatalException Propagation when OutOfMemor…

d3ff9d6

…yError occurs on BroadcastExchangeExec building small table to broadcast

erenavsarogullari force-pushed the SPARK-50386 branch from d4c1f9a to d3ff9d6 Compare November 22, 2024 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50386][SQL] Improve SparkFatalException Propagation when OutOfMemoryError occurs on BroadcastExchangeExec building small table to broadcast #48925

[SPARK-50386][SQL] Improve SparkFatalException Propagation when OutOfMemoryError occurs on BroadcastExchangeExec building small table to broadcast #48925

erenavsarogullari commented Nov 21, 2024

[SPARK-50386][SQL] Improve SparkFatalException Propagation when OutOfMemoryError occurs on BroadcastExchangeExec building small table to broadcast #48925

Are you sure you want to change the base?

[SPARK-50386][SQL] Improve SparkFatalException Propagation when OutOfMemoryError occurs on BroadcastExchangeExec building small table to broadcast #48925

Conversation

erenavsarogullari commented Nov 21, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?