-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44860][SQL] Add SESSION_USER function #42549
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! Generally LGTM, just one place to add more comments.
@@ -302,6 +302,24 @@ case class CurrentUser() extends LeafExpression with Unevaluable { | |||
final override val nodePatterns: Seq[TreePattern] = Seq(CURRENT_LIKE) | |||
} | |||
|
|||
// scalastyle:off line.size.limit | |||
@ExpressionDescription( | |||
usage = """_FUNC_() - connected user name.""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both this comment and the existing comment for current_user
on L289 above could use some more information, especially now that we have both current_user
and session_user
as separate functions. Even if this PR implements both functions as the same implementation, could we at least (1) dig into how the current_user
function is currently implemented and describe all the cases there, and (2) mention for this session_user
function that this should always refer to the identity of the invoker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converting to alias since session_user change is not relevent for non-external functions.
since = "3.5.0", | ||
group = "misc_funcs") | ||
// scalastyle:on line.size.limit | ||
case class SessionUser() extends LeafExpression with Unevaluable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not register as an alias with CurrentUser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you!
@yaooqinn please take a look. Thank you! |
@@ -95,12 +95,13 @@ trait ColumnResolutionHelper extends Logging { | |||
} | |||
} | |||
|
|||
// support CURRENT_DATE, CURRENT_TIMESTAMP, and grouping__id | |||
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, SESSION_USER and grouping__id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, SESSION_USER and grouping__id | |
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, USER, SESSION_USER and grouping__id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -48,15 +48,16 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { | |||
checkAnswer(df.selectExpr("version()"), df.select(version())) | |||
} | |||
|
|||
test("SPARK-21957: get current_user in normal spark apps") { | |||
test("SPARK-21957, SPARK-44860: get current_user, session_user in normal spark apps") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also update SPARK-21957: get current_user through thrift server
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only two minor revisons
you may need to add it the this list, since it is not supported in Python for now |
Thank you! |
thanks @vitaliili-db, merged to master |
### What changes were proposed in this pull request? `session_user` function was added in Scala in #42549, this PR adds it to Python ### Why are the changes needed? for parity ### Does this PR introduce _any_ user-facing change? yes ``` >>> import pyspark.sql.functions as sf >>> spark.range(1).select(sf.session_user()).show() # doctest: +SKIP +--------------+ |current_user()| +--------------+ | ruifeng.zheng| +--------------+ ``` ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44205 from zhengruifeng/connect_session_user. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? `session_user` function was added in Scala in apache#42549, this PR adds it to Python ### Why are the changes needed? for parity ### Does this PR introduce _any_ user-facing change? yes ``` >>> import pyspark.sql.functions as sf >>> spark.range(1).select(sf.session_user()).show() # doctest: +SKIP +--------------+ |current_user()| +--------------+ | ruifeng.zheng| +--------------+ ``` ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44205 from zhengruifeng/connect_session_user. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
This change implements
SESSION_USER
expression. It behaves exactly the same asCURRENT_USER
but according to standard when respective function is used inside a routine (UDF):CURRENT_USER
should return security definer, i.e. owner of an UDFSESSION_USER
should return connected user.The code is duplicated for this reason - to be able to identify which expression is used inside a routing.
Why are the changes needed?
This is a missing expression defined by SQL standard.
Does this PR introduce any user-facing change?
Yes, this change introduces a new expression.
How was this patch tested?
Updating existing unit tests.