Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly use Spark Connect to build the Scala 3 implementation #59

Open
MrPowers opened this issue Apr 26, 2024 · 3 comments
Open

Possibly use Spark Connect to build the Scala 3 implementation #59

MrPowers opened this issue Apr 26, 2024 · 3 comments

Comments

@MrPowers
Copy link

Spark Connect decouples the client and the Spark driver, so the client doesn't need the same software versions as the driver.

A Spark Connect Scala 3 should provide a similar user experience.

spark-connect-go, spark-connect-rs, and Spark Connect C# already exist.

Would it make sense to possibly use Spark Connect instead?

@michael72
Copy link
Contributor

Hey @MrPowers - that sounds like a cool project!

However I am not sure how much effort we would have to put into that. On the other hand it is frustrating not to be able to use scala 3 on spark client side. I am not sure if this can be part of spark-scala3 or if spark-scala3 could somehow be rewritten to create DataFrame objects and use them via spark connect. Only it seems like spark connect does not support all APIs. 😔

Unfortunately I think we've hit some limits with spark-scala3. For instance I have seen that there are still problems serializing scala 3 closures - the code for that is called inside the spark libraries. When that happens the exception doesn't even show in the logs - so it just hangs without a message. So a somewhat simple call to map doesn't work with spark-scala3 and probably there are other functions that cannot simply be called. There is a lack of unit testing done in this project for available spark functions so we actually don't know.

@MrPowers
Copy link
Author

@michael72 - Yea, it might make sense to just create another repo. The Spark core devs are definitely excited about a Spark Connect + Scala 3 project. You're right that Spark Connect doesn't support all the Spark APIs, but I think it's a better architecture and am hopeful we could build something that's production grade.

@grundprinzip
Copy link

As you've already pointed out closure serialization is an issue because the closures are injected into the target JVM. I'm no Scala expert ans thus don't know if there is any chance at all to be able to load the compiled code without rebuilding all of Spark in Scala 3. Maybe some clever class loader isolation is enough?

On the client side however, you can have the full freedom to use Scala 3. An interesting approach might be to try to simply copy / use the existing JVM client and see how far you come without UDFs and closures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants