Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44860][SQL] Add SESSION_USER function #42549

Closed
wants to merge 11 commits into from

Conversation

vitaliili-db
Copy link
Contributor

What changes were proposed in this pull request?

This change implements SESSION_USER expression. It behaves exactly the same as CURRENT_USER but according to standard when respective function is used inside a routine (UDF):

  • CURRENT_USER should return security definer, i.e. owner of an UDF
  • SESSION_USER should return connected user.

The code is duplicated for this reason - to be able to identify which expression is used inside a routing.

Why are the changes needed?

This is a missing expression defined by SQL standard.

Does this PR introduce any user-facing change?

Yes, this change introduces a new expression.

How was this patch tested?

Updating existing unit tests.

@github-actions github-actions bot added the SQL label Aug 17, 2023
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this! Generally LGTM, just one place to add more comments.

@@ -302,6 +302,24 @@ case class CurrentUser() extends LeafExpression with Unevaluable {
final override val nodePatterns: Seq[TreePattern] = Seq(CURRENT_LIKE)
}

// scalastyle:off line.size.limit
@ExpressionDescription(
usage = """_FUNC_() - connected user name.""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this comment and the existing comment for current_user on L289 above could use some more information, especially now that we have both current_user and session_user as separate functions. Even if this PR implements both functions as the same implementation, could we at least (1) dig into how the current_user function is currently implemented and describe all the cases there, and (2) mention for this session_user function that this should always refer to the identity of the invoker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting to alias since session_user change is not relevent for non-external functions.

since = "3.5.0",
group = "misc_funcs")
// scalastyle:on line.size.limit
case class SessionUser() extends LeafExpression with Unevaluable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not register as an alias with CurrentUser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you!

@HyukjinKwon HyukjinKwon changed the title [SPARK-44860] Add SESSION_USER function [SPARK-44860][SQL] Add SESSION_USER function Aug 21, 2023
@vitaliili-db vitaliili-db reopened this Aug 21, 2023
@vitaliili-db
Copy link
Contributor Author

@yaooqinn please take a look. Thank you!

@@ -95,12 +95,13 @@ trait ColumnResolutionHelper extends Logging {
}
}

// support CURRENT_DATE, CURRENT_TIMESTAMP, and grouping__id
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, SESSION_USER and grouping__id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, SESSION_USER and grouping__id
// support CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_USER, USER, SESSION_USER and grouping__id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -48,15 +48,16 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession {
checkAnswer(df.selectExpr("version()"), df.select(version()))
}

test("SPARK-21957: get current_user in normal spark apps") {
test("SPARK-21957, SPARK-44860: get current_user, session_user in normal spark apps") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update SPARK-21957: get current_user through thrift server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only two minor revisons

@zhengruifeng
Copy link
Contributor

you may need to add it the this list, since it is not supported in Python for now

@vitaliili-db
Copy link
Contributor Author

you may need to add it the this list, since it is not supported in Python for now

Thank you!

@yaooqinn
Copy link
Member

thanks @vitaliili-db, merged to master

@yaooqinn yaooqinn closed this in 600f62e Aug 29, 2023
zhengruifeng added a commit that referenced this pull request Dec 6, 2023
### What changes were proposed in this pull request?
`session_user` function was added in Scala in #42549, this PR adds it to Python

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
yes
```
    >>> import pyspark.sql.functions as sf
    >>> spark.range(1).select(sf.session_user()).show() # doctest: +SKIP
    +--------------+
    |current_user()|
    +--------------+
    | ruifeng.zheng|
    +--------------+
```

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44205 from zhengruifeng/connect_session_user.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
### What changes were proposed in this pull request?
`session_user` function was added in Scala in apache#42549, this PR adds it to Python

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
yes
```
    >>> import pyspark.sql.functions as sf
    >>> spark.range(1).select(sf.session_user()).show() # doctest: +SKIP
    +--------------+
    |current_user()|
    +--------------+
    | ruifeng.zheng|
    +--------------+
```

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#44205 from zhengruifeng/connect_session_user.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants