-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44839][SS][CONNECT] Better Error Logging when user tries to serialize spark session #42594
Conversation
try: | ||
expr.command = CloudPickleSerializer().dumps(listener) | ||
except pickle.PicklingError: | ||
raise PySparkRuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic do we need a dedicated error class for PicklingError
? e.g., PySparkPicklingError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe we need a new error class for new type of user-facing errors. Could you add a new PySpark error class for representing pickle.PicklingError
?? See https://github.com/apache/spark/pull/40938/files as an example. I think we can also do it as a follow ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure I could do that. Just want to confirm the ask is to define a new PySparkPicklingError
and replace this PySparkRuntimeError
with that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define a new PySparkPicklingError and replace this PySparkRuntimeError with that right?
Correct :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good
try: | ||
expr.command = CloudPickleSerializer().dumps(listener) | ||
except pickle.PicklingError: | ||
raise PySparkRuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe we need a new error class for new type of user-facing errors. Could you add a new PySpark error class for representing pickle.PicklingError
?? See https://github.com/apache/spark/pull/40938/files as an example. I think we can also do it as a follow ups.
Co-authored-by: Haejoon Lee <[email protected]>
CC @ueshin, I also changed the serialization error thrown by UDTF, ptal! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UDTF change looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
@ueshin Can we have this merged? Thank you! |
@WweiL seems like your PR conflicted with this PR :-). Should probably update. |
Merged to master. |
What changes were proposed in this pull request?
Add a new error with detailed message when a user tries to access spark session and dataframe created using local spark session, in streaming spark connect
foreach
,foreachBatch
andStreamingQueryListener
.Update: per reviewer's request, added a new error class
PySparkPicklingError
. Also moveUDTF_SERIALIZATION_ERROR
to the new classWhy are the changes needed?
Better error logging for the breaking change introduced in streaming spark connect.
Does this PR introduce any user-facing change?
Yes, before users can only see this non-informative error when they access a local spark session in their streaming connect related functions:
Now it is replaced with:
How was this patch tested?
Add unit tests
Was this patch authored or co-authored using generative AI tooling?
No