-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23464][MESOS] Fix mesos cluster scheduler options double-escaping #20641
Conversation
Don't enclose --conf option value with "", as they are already escaped by shellEscape and additional wrapping cancels out this escaping, causing driver fail to start. This reverts commit 9b377aa [SPARK-18114]. Manual test with driver command logging added. Author: Marcin Kurczych <[email protected]>
Thanks for the PR! It seems that the previous attempt to fix this (SPARK-18114) was wrong -- I'm not sure why we didn't catch the problem before, maybe lack of testing? @krcz My suggestion for this patch is to add a test, in order to prevent another regression in the future. I've written a unit test for this -- you could do something similar: d2iq-archive@4812ba3 I will also do more testing with my own integration tests. cc @skonto |
@susanxhuynh Thanks for the example! I have added similar test, covering more cases. After you or someone else reviews it, I'll rebase the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @krcz . Overall looks good ... I left one question about one of the test parameters. Also, have you verified in the driver / executors that they receive the correct Java system properties (with the "-D" options)? -- I was going to test that myself.
"-XX+PrintGC -Dparam1=val1 -Dparam2=val2", | ||
// special characters, to be escaped | ||
"spark.executor.extraJavaOptions" -> | ||
"""-Dparam1="value 1" -Dparam2=value\ 2 -Dpath=$PATH"""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the second param (param2) would be parsed correctly if you actually ran the command. Doesn't there need to be quotes around the space? Have you tested it and checked if the executor gets the correct value for param2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@susanxhuynh
I have checked it and it worked. There don't need to be quotes, as the space has been escaped. Backslash stops it from being interpreted as a boundary between arguments and makes it being understood as simple space in the value. This is standard behaviour of bash (and sh) syntax.
@@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi | |||
}) | |||
} | |||
|
|||
test("properly wraps and escapes parameters passed to driver command") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail with the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does.
[info] - properly wraps and escapes parameters passed to driver command *** FAILED *** (154 milliseconds)
[info] "test/./bin/spark-submit --name test --master mesos://mesos://localhost:5050 --driver-cores 1.0 --driver-memory 1000M --class mainClass --conf "spark.app.name=test" --conf "spark.mesos.executor.home=tes
t" --conf "spark.executor.extraJavaOptions="-Dparam1=\"value 1\" -Dparam2=value\\ 2 -Dpath=\$PATH"" --conf "spark.driver.extraJavaOptions="-XX+PrintGC -Dparam1=val1 -Dparam2=val2"" ./jar arg" did not contain "--
conf spark.driver.extraJavaOptions="-XX+PrintGC -Dparam1=val1 -Dparam2=val2"" (MesosClusterSchedulerSuite.scala:227)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
[info] at org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSuite$$anonfun$16.apply(MesosClusterSchedulerSuite.scala:227)
[info] at org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSuite$$anonfun$16.apply(MesosClusterSchedulerSuite.scala:202)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ready to merge; anything else needed @skonto or @susanxhuynh ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great to merge it if you think it's ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I was going to test this in DC/OS and haven't gotten a chance to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@susanxhuynh I can do a test for this.
@susanxhuynh I tested this. Let me know if I need to test anything else. Here is what you get:
At the driver side in DC/OS:
Also tried: Got:
|
@skonto |
@krcz that is the output of |
@skonto The one directly below "At the driver side in DC/OS:" in your comment. But if it's output of ps that's ok, as |
@krcz yes it is ok, yes ps has limitations. So I verified it just to be on the safe side:
Spark-class will run this at the executor side:
@susanxhuynh if there is no issue we can move on with this fix. @krcz it would be good to add more tests like having a scenario where pure GC properties are passed to make it more realistic. This is pretty common in the field: d2iq-archive/spark-build#263 For example: "'--verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps'". Btw I tested and that passes but it would be good to have such a test to avoid any regressions. |
@skonto Thanks for testing it. Tests results look good. |
ok, could you rebase this PR |
ok to test |
@mgummelt too |
Test build #91820 has finished for PR 20641 at commit
|
@felixb @skonto |
@krcz |
@krcz how is your PR different from 21014? |
@felixcheung It fixes the same problem so in terms of implementation it is not very different. But when I created this one 21014 didn't exist. The only difference is that this PR adds an unit test, which 21014 had not. That's why I'm asking if I should modify this PR to just add the test. |
yap, that sounds like a good idea |
Can one of the admins verify this patch? |
I'm closing this since there's been no activity. If you plan on adding the test, update your branch, and tag this with the other bug number (I duped this one to the bug with the fix attached to it). |
What changes were proposed in this pull request?
Don't enclose --conf option value with "", as they are already escaped by shellEscape
and additional wrapping cancels out this escaping, making the driver fail to start.
This reverts commit 9b377aa [SPARK-18114].
How was this patch tested?
Manual test with driver command logging added.
Author: Marcin Kurczych [email protected]