Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Spark-Expectations for Data Pipelines written using Scala/Java Spark SDK #30

Open
phanikumarvemuri opened this issue Sep 11, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@phanikumarvemuri
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Support spark-expectations for the pipelines written in Scala/Java Spark SDK

Describe the solution you'd like
TBD

Describe alternatives you've considered

Additional context
Make spark-expectations work for other language spark SDK's

@phanikumarvemuri phanikumarvemuri added the enhancement New feature or request label Sep 11, 2023
@newfront
Copy link

This means we'd also have access to all APIs including expression encoders and ASTs.

I'm a go for this, we'd need to make a decision on supported Java versions (Databricks is still using 8 internally, Java 11 and 17 are fairly well supported in the OSS project), and keep a close watch on changes coming to Spark 4.0 and what supported Scala versions (2.12.1x, 2.13,3.x) will be maintained moving forwards.

@newfront
Copy link

@newfront
Copy link

I support this 100%. This would allow us to tap into the spark.sparkContext._gateway.jvm.com.nike.* from the pyspark side and support finer grained controls (with added complexity for supported scala versions - but...), it could be useful.

@phanikumarvemuri
Copy link
Contributor Author

Thanks @newfront . I will work on requirements and we can all discuss the ideas on implementation and architecture.
@asingamaneni Let me know your thoughts ?
We may not be able to use python decorators like pattern but can implement a solution that adheres closely to the same principle

@newfront
Copy link

The decorators pattern could be replaced with builders for the Scala api. Then when migrating the decorators for pyspark, we'd be able to call into the Scala api, and the decorators could be wired into the builders options closing the loop.

If we use the _gateway on py4j from pyspark this change could be done transparent (aside from requiring the underlying jar)

@phanikumarvemuri
Copy link
Contributor Author

So, we write spark-expectations using Scala and if users want to use spark-expectations for python we will provide decorator pattern and the wrappers inside decorators will map the arguments and call respective scala classes using py4j ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants