Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49552][CONNECT][FOLLOW-UP] Make 'randstr' and 'uniform' determ…
…inistic in Scala Client ### What changes were proposed in this pull request? Make 'randstr' and 'uniform' deterministic in Scala Client ### Why are the changes needed? We need to explicitly set the seed in connect clients, to avoid making the output dataframe non-deterministic (see 14ba4fc) When reviewing #48143, I requested the author to set the seed in python client. But at that time, I was not aware of the fact that Spark Connect Scala Client was reusing the same `functions.scala` under `org.apache.spark.sql`. (There were two different files before) So the two functions may cause non-deterministic issues like: ``` scala> val df = spark.range(10).select(randstr(lit(10)).as("r")) Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties df: org.apache.spark.sql.package.DataFrame = [r: string] scala> df.show() +----------+ | r| +----------+ |5bhIk72PJa| |tuhC50Di38| |PxwfWzdT3X| |sWkmSyWboh| |uZMS4htmM0| |YMxMwY5wdQ| |JDaWSiBwDD| |C7KQ20WE7t| |IwSSqWOObg| |jDF2Ndfy8q| +----------+ scala> df.show() +----------+ | r| +----------+ |fpnnoLJbOA| |qerIKpYPif| |PvliXYIALD| |xK3fosAvOp| |WK12kfkPXq| |2UcdyAEbNm| |HEkl4rMtV1| |PCaH4YJuYo| |JuuXEHSp5i| |jSLjl8ug8S| +----------+ ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? after this fix: ``` scala> val df = spark.range(10).select(randstr(lit(10)).as("r")) df: org.apache.spark.sql.package.DataFrame = [r: string] scala> df.show() +----------+ | r| +----------+ |Gri9B9X8zI| |gfhpGD8PcV| |FDaXofTzlN| |p7ciOScWpu| |QZiEbF5q7c| |9IhRoXmTUM| |TeSEG1EKSN| |B7nLw5iedL| |uFZo1WPLPT| |46E2LVCxxl| +----------+ scala> df.show() +----------+ | r| +----------+ |Gri9B9X8zI| |gfhpGD8PcV| |FDaXofTzlN| |p7ciOScWpu| |QZiEbF5q7c| |9IhRoXmTUM| |TeSEG1EKSN| |B7nLw5iedL| |uFZo1WPLPT| |46E2LVCxxl| +----------+ ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #48558 from zhengruifeng/sql_rand_str_seed. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information