Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades to Spark 3.4/JRE 17 and fixes all high/critical CVEs #226

Merged
merged 4 commits into from
Apr 16, 2024

Conversation

codefromthecrypt
Copy link
Member

I manually tested this on all three storage types in docker as well (using zipkin's docker/examples instructions)

@codefromthecrypt
Copy link
Member Author

trivy is clean now, so even if we don't release until 8.14 final... at least we can apply the security settings same as other repos after merge:

$ trivy repo .
2024-04-15T15:56:48.696-1000	INFO	Need to update DB
2024-04-15T15:56:48.696-1000	INFO	DB Repository: ghcr.io/aquasecurity/trivy-db:2
2024-04-15T15:56:48.696-1000	INFO	Downloading DB...
45.03 MiB / 45.03 MiB [------------------------------------------------------------------------------------------------------------------] 100.00% 2.58 MiB p/s 18s
2024-04-15T15:57:07.461-1000	INFO	Vulnerability scanning is enabled
2024-04-15T15:57:07.461-1000	INFO	Secret scanning is enabled
2024-04-15T15:57:07.461-1000	INFO	If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2024-04-15T15:57:07.462-1000	INFO	Please see also https://aquasecurity.github.io/trivy/v0.50/docs/scanner/secret/#recommendation for faster secret detection
2024-04-15T15:57:07.731-1000	INFO	Number of language-specific files: 5
2024-04-15T15:57:07.731-1000	INFO	Detecting pom vulnerabilities...

Signed-off-by: Adrian Cole <[email protected]>
Copy link
Member Author

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elastic/elasticsearch-hadoop#2187 unlocked this (releasing in elastic-hadoop 8.14, but not sure when)

other notes below

@@ -150,6 +150,9 @@ public CassandraDependenciesJob build() {
df.setTimeZone(TimeZone.getTimeZone("UTC"));
this.dateStamp = df.format(new Date(builder.day));
this.conf = new SparkConf(true).setMaster(builder.sparkMaster).setAppName(getClass().getName());
if (builder.sparkMaster.startsWith("local[")) {
conf.set("spark.driver.bindAddress", "127.0.0.1");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a spark 3.4 thing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically it tries to detect with the hostname, which isn't needed for local mode anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is worth looking into, but InetAddress.getLocalHost().getHostAddress() may be more reliable option (fe if the host uses IPv6 only).

.values()
.map(DEPENDENCY_LINK_JSON);
JavaRDD<Map<String, Object>> links;
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just polish as we can use try/resources with some of the drivers

@@ -0,0 +1,14 @@
# Set everything to be logged to the console
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also spark 3.4 thing (log4j 2 not 1.2 config)

@@ -187,6 +224,53 @@
<type>pom</type>
<scope>import</scope>
</dependency>

<!-- CVE fix versions -->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep the build without CVE will be difficult, but anyway at least once it is clean ;)

@codefromthecrypt
Copy link
Member Author

some big wins from merging this include:

@codefromthecrypt
Copy link
Member Author

oh yeah spent so much time doing this I forgot why.. I was trying to polish this up prior to adding dependencies to helm. There was a point where I though maybe we need to rewrite the entire thing (like in beam) to solve the revlock. I'm glad it didn't get that far.

openzipkin/zipkin-helm#11

@codefromthecrypt
Copy link
Member Author

openzipkin/zipkin#3763 for zipkin changes we can now do

Adrian Cole added 2 commits April 15, 2024 21:31
Signed-off-by: Adrian Cole <[email protected]>
Signed-off-by: Adrian Cole <[email protected]>
exec java ${JAVA_OPTS} -Djava.io.tmpdir=/tmp -cp classes zipkin2.dependencies.ZipkinDependenciesJob $@
# Spark 3.4 module config from:
# https://github.com/apache/spark/blob/branch-3.4/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java#L29
exec java ${JAVA_OPTS} -Djava.io.tmpdir=/tmp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@codefromthecrypt codefromthecrypt merged commit 6be9f80 into master Apr 16, 2024
6 checks passed
@codefromthecrypt codefromthecrypt deleted the spark-3.4 branch April 16, 2024 16:32
@codefromthecrypt
Copy link
Member Author

thanks for the look folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants