-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sedona-spark-shaded package should exclude some repetitive dependencies. #1584
Comments
@freamdx would you mind create a PR to fix this yourself? |
A few notes:
|
I agree with @Kontinuation. Guava is pretty tricky, as I learned it when upgrading Spark. I'm having trouble understanding why this Google's library has such poor compatibility issues. It seems unexpected from a leading tech company I'm willing to step in if @jiayuasu approves, given @freamdx's lack of response. However, I some worried proceeding without rigorous testing to mitigate potential risks. |
@zwu-net please feel free to create a PR to fix this |
@jiayuasu (I would like also to bring @james-willis here since I got know him due to DBScan discussion. If you are busy on other tasks, you don't need to participate) @Kontinuation @freamdx Research Findings and Proposal After researching the feasibility of modifying Sedona's pom.xml (https://github.com/apache/sedona/blob/master/spark-shaded/pom.xml), I concluded that manual management of library inclusions/exclusions across various Spark and Sedona versions would be overly complex and labor-intensive. Proposed Solution To address this challenge, I suggest creating a custom tool to manage pom.xml generation. This approach is inspired by my previous work on a custom transformer of SQLs from Oracle to Snowflake(https://www.linkedin.com/pulse/revolutionizing-data-analysis-custom-transformer-seamless-paul-wu-4euxc/?trackingId=1AyXqiJh1sei4yybC%2FZW9g%3D%3D). Tool Requirements The tool should:
Implementation Considerations Given the scope of this task, it may take several months for part-time contributors like myself to implement and test. Before proceeding, I'd appreciate feedback on:
Please share your thoughts. |
pom.xml should exclude any dependencies that exist in spark jars, eg:
edu.ucar:cdm-core exclude guava/httpclient/protobuf-java
... ... <dependency> <groupId>edu.ucar</groupId> <artifactId>cdm-core</artifactId> <exclusions> <exclusion> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> </exclusion> <exclusion> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> </exclusion> <exclusion> <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> </exclusion> </exclusions> </dependency> ... ... <artifactSet> <excludes> <exclude>org.scala-lang:scala-library</exclude> <exclude>org.apache.commons:commons-*</exclude> <exclude>commons-pool:commons-pool</exclude> <exclude>commons-lang:commons-lang</exclude> <exclude>commons-io:commons-io</exclude> <exclude>commons-logging:commons-logging</exclude> </excludes> </artifactSet> ... ...
The text was updated successfully, but these errors were encountered: