Calculating hashes vs using from maven cache #493

prabhu · 2024-05-01T18:21:57Z

Noticed that the hashes are calculated based on the real file instead of using existing information from the maven cache.

cyclonedx-maven-plugin/src/main/java/org/cyclonedx/maven/DefaultModelConverter.java

Line 164 in 51bde9e

    
           component.setHashes(BomUtils.calculateHashes(artifact.getFile(), schemaVersion));

Further, I believe there might be a few problems with the hashing logic used.

try (InputStream fis = new BufferedInputStream(Files.newInputStream(file.toPath()),bufSize)) {
            final byte[] buf = new byte[bufSize];
            while (fis.available() > 0) {
                final int read = fis.read(buf);
                digests.stream().parallel().forEach(d-> d.update(buf, 0, read));
            }
        }

There is no encoding specified, so not sure if this affects the data that gets read.
Files.newInputStream appears to be non-locking as per the docs.
parallel method is used which is an intermediate operation as per the documentation.

Is it possible that incorrect data might get read, if multiple build and file copy operations also happen in parallel?

I am not very knowledgeable in Java, so please correct me if I am wrong.

The text was updated successfully, but these errors were encountered:

ppkarwasz · 2024-05-01T21:38:32Z

@prabhu,

Maven Central mostly relies on MD5 and SHA1 hashes. These are the only two algorithms required by OSSRH (cf. OSSRH Requirements) and in practice they are the only ones published to Maven Central. This is why CycloneDX Maven plugin computes the hashes itself.

If you observe wrong results in a multi-module Maven build, these are most probably due to #410, not concurrency problems.

A long-standing problem in multi-module Maven builds is that it is really hard to execute some task at the end of the build, so usually the aggregate SBOM is generated based on the previous snapshots.

VinodAnandan · 2024-05-02T17:47:51Z

I recommend against using the Maven Central/external hash. Several Java frameworks modify the .class files. To accurately identify these modified jars, it is advisable to use locally computed hashes instead.

prabhu · 2024-05-02T20:58:00Z

@VinodAnandan, in that case the purl must be updated to represent the fact that such jars might be different to the one published in maven central.

I have tried to handle this case using evidence by comparing the generated hash with the one in the maven cache. Could you kindly test with my PR and let me know how it looks?

#494

VinodAnandan · 2024-05-15T01:57:23Z

@prabhu Do you mean to capture that information similar to repository_url=localhost/.m2/ ? I believe we expect the default repository of Maven to be https://repo.maven.apache.org/maven2 . However, in many enterprise cases, the default repository will be their internal repositories (Nexus or JFrog) used by their build tool. @stevespringett, @pombredanne please correct me if I misunderstood. Perhaps the tool should compare the local JAR hashes with those at the actual download location. If they differ, then it could update the PURL repository URL to repository_url=localhost/.m2/ ?

@prabhu @stevespringett What are your thoughts on capturing modified change information in the pedigree part of the CycloneDX (https://cyclonedx.org/use-cases/#pedigree)? In a recent discussion with the Quarkus team (@aloubyansky), we (@nscuro and I) also discussed Quarkus modifying their JARs. I'm not sure if they have found a solution to capture these changes yet.

aloubyansky · 2024-05-15T03:35:46Z

Yes, we have all the info, of course. It's a matter of properly manifesting it. I'll share some examples soon.

aloubyansky · 2024-05-15T03:46:42Z

@prabhu,

Maven Central mostly relies on MD5 and SHA1 hashes. These are the only two algorithms required by OSSRH (cf. OSSRH Requirements) and in practice they are the only ones published to Maven Central. This is why CycloneDX Maven plugin computes the hashes itself.

If you observe wrong results in a multi-module Maven build, these are most probably due to #410, not concurrency problems.

A long-standing problem in multi-module Maven builds is that it is really hard to execute some task at the end of the build, so usually the aggregate SBOM is generated based on the previous snapshots.

This can be done by an implementing https://maven.apache.org/ref/3.5.0/apidocs/org/apache/maven/AbstractMavenLifecycleParticipant.html and configuring the plug-in as containing extensions.

hboutemy · 2024-05-16T06:18:13Z

There is no encoding specified, so not sure if this affects the data that gets read

a hash is always a hash of binary content: there never is encoding involved in hashing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating hashes vs using from maven cache #493

Calculating hashes vs using from maven cache #493

prabhu commented May 1, 2024 •

edited

Loading

ppkarwasz commented May 1, 2024

VinodAnandan commented May 2, 2024

prabhu commented May 2, 2024

VinodAnandan commented May 15, 2024 •

edited

Loading

aloubyansky commented May 15, 2024

aloubyansky commented May 15, 2024

hboutemy commented May 16, 2024

Calculating hashes vs using from maven cache #493

Calculating hashes vs using from maven cache #493

Comments

prabhu commented May 1, 2024 • edited Loading

ppkarwasz commented May 1, 2024

VinodAnandan commented May 2, 2024

prabhu commented May 2, 2024

VinodAnandan commented May 15, 2024 • edited Loading

aloubyansky commented May 15, 2024

aloubyansky commented May 15, 2024

hboutemy commented May 16, 2024

prabhu commented May 1, 2024 •

edited

Loading

VinodAnandan commented May 15, 2024 •

edited

Loading