Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] REPL query failure results in unclear error message #445

Closed
dai-chen opened this issue Jul 23, 2024 · 1 comment
Closed

[BUG] REPL query failure results in unclear error message #445

dai-chen opened this issue Jul 23, 2024 · 1 comment
Labels
0.5 bug Something isn't working

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Jul 23, 2024

What is the bug?

When running queries in the REPL, the error message returned is generic and lacks specifics about the root cause. For instance, a failure in executing a query results in the following error message:

Fail to run query. Cause: Exception thrown in awaitResult: 

This message does not provide sufficient information to diagnose the actual issue, making troubleshooting difficult.

How can one reproduce the bug?

The bug can be reproduced using the following steps in the REPL IT environment:

class FlintREPLITSuite extends SparkFunSuite with OpenSearchSuite with JobTest {
  ...

  test("error") {
    try {
      createSession(jobRunId, "")
      threadLocalFuture.set(startREPL())

      val createStatement =
        s"""
           | CREATE TABLE $testTable
           | (
           |   name STRING,
           |   age INT
           | )
           | USING CSV
           | OPTIONS (
           |  header 'false',
           |  delimiter '\\t'
           | )
           | LOCATION 's3://test-123';
           |""".stripMargin
      submitQuery(s"${makeJsonCompliant(createStatement)}", "99")

      val selectQueryValidation: REPLResult => Boolean = result => {
        // noinspection ScalaStyle
        println(result)
        true
      }
      pollForResultAndAssert(selectQueryValidation, "99")
    } catch {
      case e: Exception =>
        logError("Unexpected exception", e)
        assert(false, "Unexpected exception")
    } finally {
      waitREPLStop(threadLocalFuture.get())
      threadLocalFuture.remove()
    }

24/07/23 15:01:16 INFO FlintREPL: command complete: FlintStatement(
state=failed, 
query=  CREATE TABLE my_glue1.default.flint_sql_test  (    name STRING,    age INT  )  USING CSV  
OPTIONS (   header 'false',   delimiter '	'  )  LOCATION 's3://test-123'; , 
statementId=e0cafbd7-6e28-4f6e-8076-78026e28d6c6, 
queryId=99, 
submitTime=1721772068015, 
error=Some({"Message":"Fail to run query. Cause: Exception thrown in awaitResult: "}))

What is the expected behavior?

The system should provide a detailed error message that includes the root cause of the failure. Instead of a generic message, users should receive information that helps in diagnosing the underlying problem, such as specific exceptions or stack traces.

Do you have any additional context?

Spark log:

24/07/23 15:01:16 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3://test-123.
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
	at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:79)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.spark.sql.FlintJobExecutor.executeQuery(FlintJobExecutor.scala:423)
	at org.apache.spark.sql.FlintJobExecutor.executeQuery$(FlintJobExecutor.scala:412)
	at org.apache.spark.sql.FlintREPL$.executeQuery(FlintREPL.scala:49)
	at org.apache.spark.sql.FlintREPL$.$anonfun$executeQueryAsync$1(FlintREPL.scala:821)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
24/07/23 15:01:16 INFO FlintREPL: command complete: FlintStatement(
state=failed, query=  CREATE TABLE my_glue1.default.flint_sql_test  (    name STRING,    age INT  )  USING CSV  
OPTIONS (   header 'false',   delimiter '	'  )  LOCATION 's3://test-123'; , 
statementId=e0cafbd7-6e28-4f6e-8076-78026e28d6c6, queryId=99, submitTime=1721772068015, 
error=Some({"Message":"Fail to run query. Cause: Exception thrown in awaitResult: "}))
@dai-chen dai-chen added bug Something isn't working untriaged 0.5 and removed untriaged labels Jul 23, 2024
@dai-chen
Copy link
Collaborator Author

dai-chen commented Jul 23, 2024

After investigation, it appears that the root cause of the issue lies in the following line of code. The code is using the original exception ex instead of the root cause r. After making the necessary changes, the error message stored in the result behaves as expected.

24/07/23 15:25:30 INFO FlintREPL: command complete: FlintStatement(
state=failed, query=  CREATE TABLE my_glue1.default.flint_sql_test  (    name STRING,    age INT  )  
USING CSV  OPTIONS (   header 'false',   delimiter '	'  )  LOCATION 's3://test-123'; , 
statementId=8ba68f33-d83b-428a-82b9-60def7a29cd2, queryId=99, submitTime=1721773522390, 
error=Some({"Message":"Fail to run query. Cause: No FileSystem for scheme \"s3\""}))

Code changes:

diff --git a/spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJobExecutor.scala 
b/spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJobExecutor.scala
index f38a27ef..c7dccf64 100644
--- a/spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJobExecutor.scala
+++ b/spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJobExecutor.scala
@@ -500,7 +500,7 @@ trait FlintJobExecutor {
           val e = new SecurityException(ExceptionMessages.GlueAccessDeniedMessage)
           handleQueryException(e, ExceptionMessages.QueryRunErrorPrefix)
         } else {
-          handleQueryException(ex, ExceptionMessages.QueryRunErrorPrefix)
+          handleQueryException(r, ExceptionMessages.QueryRunErrorPrefix)
         }
     }
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.5 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants