Skip to content

Commit

Permalink
[ANE-1967] Recursive jars in containers (#1478)
Browse files Browse the repository at this point in the history
Co-authored-by: Christopher Sasarak <[email protected]>
  • Loading branch information
spatten and csasarak authored Nov 1, 2024
1 parent 666372e commit e9e8ade
Show file tree
Hide file tree
Showing 11 changed files with 261 additions and 29 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- Microsoft SQL Server 2019 Developer, 2019 Evaluation, and 2019 Express
- Microsoft SQL Server 2022 Enterprise, Standard, Web
- Viskoe.dk Terms of Use
- Container scanning: Recursively find jars within jars ([#1478](https://github.com/fossas/fossa-cli/pull/1478))

## 3.9.37

Expand Down
54 changes: 28 additions & 26 deletions docs/references/subcommands/container/scanner.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# FOSSA's container scanner

- [FOSSA's container scanner](#fossas-new-container-scanner)
- [What's new in this scanner?](#whats-new-in-this-scanner)
- [FOSSA's container scanner](#fossas-container-scanner)
- [What's supported in FOSSA's container scanner?](#whats-supported-in-fossas-container-scanner)
- [Documentation](#documentation)
- [Container image source](#container-image-source)
- [1) Exported docker archive](#1-exported-docker-archive)
- [2) From Docker Engine](#2-from-docker-engine)
- [3) From registries](#3-from-registries)
- [Container image analysis](#container-image-analysis)
- [Container Jar analysis](#container-jar-analysis)
- [Container JAR analysis](#container-jar-analysis)
- [Distroless Containers](#distroless-containers)
- [Supported Container Package Managers](#supported-container-package-managers)
- [View detected projects](#view-detected-projects)
Expand All @@ -19,7 +19,7 @@
- [How do I scan multi-platform container images with `fossa-cli`?](#how-do-i-scan-multi-platform-container-images-with-fossa-cli)
- [How can I only scan for system dependencies (alpine, dpkg, rpm)?](#how-can-i-only-scan-for-system-dependencies-alpine-dpkg-rpm)
- [How do I exclude specific projects from container scanning?](#how-do-i-exclude-specific-projects-from-container-scanning)
- [Limitations & Workarounds](#limitations--workarounds)
- [Limitations \& Workarounds](#limitations--workarounds)

## What's supported in FOSSA's container scanner?

Expand Down Expand Up @@ -50,9 +50,9 @@ To scan a container image with `fossa-cli`, use the `container analyze` command:
# This command uses the repository name as project name, and image digest as the revision.
# Like standard FOSSA analysis, the project name is customizable via `--project` and revision via `--revision`:
#
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
#
fossa container analyze <IMAGE>
fossa container analyze <IMAGE>

# Similar to the above, but instead of uploading the results they are instead written to the terminal in JSON format.
#
Expand Down Expand Up @@ -89,13 +89,13 @@ By default `fossa-cli` attempts to identify `<IMAGE>` source in the following or

```bash
docker save redis:alpine > redis_alpine.tar
fossa container analyze redis_alpine.tar
fossa container analyze redis_alpine.tar
```

### 2) From Docker Engine

```bash
fossa container analyze redis:alpine
fossa container analyze redis:alpine
```

For this image source to work, `fossa-cli` requires docker to be running and accessible.
Expand All @@ -118,7 +118,7 @@ curl --unix-socket /var/run/docker.sock -X GET "http://localhost/v1.28/images/re
### 3) From registries

```bash
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
```

This step works even if you do not have docker installed or have docker engine accessible.
Expand All @@ -138,17 +138,17 @@ If `<IMAGE>` is not a docker image archive and is not accessible via the docker
| `quay.io/org/image:tag` | `quay.io` | `org/image` | `tag` |

Note:
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.

Analyzing the container image for a platform other than the one currently running is possible by specifying the digest for the image on a different platform.

For example, the following command analyzes the `arm64` platform image of `ghcr.io/graalvm/graalvm-ce@sha256` regardless of the platform running `fossa container analyze`:

```bash
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
```

**Private registries**
Expand All @@ -171,18 +171,18 @@ This is done in following steps:
}
```

If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.
If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.

To explicitly provide a username and password, use HTTP-style authentication in the image URL.
For this to work the host value must be present in the image URL:

```bash
fossa container analyze user:[email protected]/org/image:tag
fossa container analyze user:[email protected]/org/image:tag
```

**Retrieving image from registry**

`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
image manifests, and image artifacts from registry. It does so in following manner:

1) `HEAD <repository>/manifests/<tag-or-digest>` (to see if the manifests exists)
Expand All @@ -194,20 +194,22 @@ image manifests, and image artifacts from registry. It does so in following mann
4) Download all blobs using `GET /v2/<repository>/blobs/<digest>` (if blobs are tar.gzip, they will be gzip extracted)
5) From artifacts downloaded representative image tarball will be created.

All `GET` request from step 2 to step 5, will make `HEAD` call prior to confirm existence of resource. If
All `GET` request from step 2 to step 5, will make a `HEAD` call prior to confirm existence of resource. If
401 status is received new access token will be generated using auth flow mentioned in step (1).

## Container image analysis

The container scanner scans in two steps:
1. The base layer.
2. The rest of the layers, squashed.
2. The rest of the layers, squashed.

### Container JAR analysis

The container analyzer will try to find Java Archive (Jar) files inside each layer.
It will then report them to FOSSA which will try to match the Jar files to the project they are a build artifact from.

The container analyzer will also expand each Jar file that it encounters and report any Jar files that it finds in the expanded Jar file. This is done recursively.

This process relies on there being a back-end that can perform that analysis.
SaaS customers should have this functionality available but on-prem customers may need to contact FOSSA support to have it enabled.

Expand Down Expand Up @@ -264,7 +266,7 @@ and if desired can inform [analysis target configuration](../../files/fossa-yml.

Example output:
```bash
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable

[ INFO] Discovered image for: ghcr.io/tcort/markdown-link-check:stable (of 137610196 bytes) via docker engine api.
[ INFO] Exporting docker image to temp file: /private/var/folders/hb/pg5d0r196kq1qdswr6_79hzh0000gn/T/fossa-docker-engine-tmp-f7af2b5d1ec5173d/image.tar! This may take a while!
Expand Down Expand Up @@ -296,7 +298,7 @@ exclude:
### Debugging
`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.
`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.

```bash
fossa container analyze redis:alpine --debug
Expand All @@ -315,7 +317,7 @@ Images can be exported to archives using Docker:
docker pull <IMAGE>:<TAG> # or docker pull <IMAGE>@<DIGEST>
docker save <IMAGE>:<TAG> > image.tar
fossa container analyze image.tar --container scanner
fossa container analyze image.tar --container scanner
rm image.tar
```
Expand All @@ -328,7 +330,7 @@ By default when `fossa-cli` is analyzing multi-platform image it prefers using t
If a specific platform is desired, use the digest for that platform:

```bash
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
```

### How can I only scan for system dependencies (alpine, dpkg, rpm)?
Expand All @@ -342,7 +344,7 @@ fossa container analyze <IMAGE> --only-system-deps
### How do I exclude specific projects from container scanning?

Use a FOSSA configuration file to perform exclusion of projects or paths.
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.

As an example, the following configuration file only analyzes `setuptools`, and `alpine` packages:

Expand Down Expand Up @@ -371,7 +373,7 @@ The recommended workaround is to export the image to an archive, then analyze th
docker pull quay.io/org/image:tag
docker save quay.io/org/image:tag > img.tar
fossa container analyze img.tar
fossa container analyze img.tar
rm img.tar
```

Expand Down
1 change: 1 addition & 0 deletions extlib/millhone/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ tracing-subscriber = { version = "0.3.17", features = ["json"] }
lazy-regex = { version = "3.0.2", features = ["std", "regex"] }
fingerprint = { git = "https://github.com/fossas/lib-fingerprint.git", tag = "v3.0.0", default-features = false, features = ["fp-content-serialize-base64"] }
tar = "0.4.41"
zip = "2.1.3"

[dev-dependencies]
maplit = "1.0.2"
Expand Down
156 changes: 153 additions & 3 deletions extlib/millhone/src/cmd/analyze_container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use std::{
collections::{HashMap, HashSet},
fs::File,
io::{BufWriter, Read},
path::PathBuf,
path::{Path, PathBuf},
};

use clap::Parser;
Expand Down Expand Up @@ -125,10 +125,15 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
debug!("fingerprinting");
let entry = buffer(entry).context("read jar file")?;

match Combined::from_buffer(entry) {
Ok(fingerprints) => discoveries.push(DiscoveredJar::new(path, fingerprints)),
match Combined::from_buffer(entry.clone()) {
Ok(fingerprints) => {
discoveries.push(DiscoveredJar::new(path.clone(), fingerprints))
}
Err(e) => warn!("failed to fingerprint: {e:?}"),
}
let mut discovered_in_jars =
recursive_jars_in_jars(&entry, path, 0).context("recursively discover jars")?;
discoveries.append(&mut discovered_in_jars);

Ok(())
})?;
Expand All @@ -137,6 +142,56 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
Ok(discoveries)
}

const MAX_JAR_DEPTH: u32 = 100;

#[tracing::instrument(skip(jar_contents))]
fn recursive_jars_in_jars(
jar_contents: &[u8],
containing_jar_path: PathBuf,
depth: u32,
) -> Result<Vec<DiscoveredJar>> {
if depth > MAX_JAR_DEPTH {
return Ok(vec![]);
}
let mut discoveries = Vec::new();
let mut archive =
zip::ZipArchive::new(std::io::Cursor::new(jar_contents)).context("unzipping jar")?;
for path in archive.clone().file_names() {
debug!("file_name: {path}");
if !path.ends_with(".jar") {
continue;
}

debug!(?path, "jar file found");
let mut zip_file = archive
.by_name(path)
.context("getting zip file info by path")?;
if !zip_file.is_file() {
debug!(?path, "skipped: not a file");
continue;
}
let mut buffer: Vec<u8> = Vec::new();
zip_file
.read_to_end(&mut buffer)
.context("reading jar from zip into buffer")?;
let joined_path = Path::new(&containing_jar_path).join(path);

// fingerprint the jar
match Combined::from_buffer(buffer.clone()) {
Ok(fingerprints) => {
discoveries.push(DiscoveredJar::new(joined_path.clone(), fingerprints))
}
Err(e) => warn!("failed to fingerprint: {e:?}"),
}

// recursively find more jars
let mut discovered_in_jars = recursive_jars_in_jars(&buffer, joined_path, depth + 1)
.context("recursively discover jars")?;
discoveries.append(&mut discovered_in_jars);
}
Ok(discoveries)
}

#[tracing::instrument]
fn list_container_layers(layer_path: &PathBuf) -> Result<HashSet<PathBuf>> {
let mut layers = HashSet::new();
Expand Down Expand Up @@ -250,4 +305,99 @@ mod tests {
let expected: Value = serde_json::from_str(MILLHONE_OUT).expect("Parse expected json");
pretty_assertions::assert_eq!(expected, res);
}

// This container contains top.jar which contains middle.jar, which contains deepest.jar
// It also includes middle.jar and deepest.jar
// So we should find 6 total jars: three from top.jar and its nested jars, two from middle.jar and its nested jar and then deepest.jar
// We are also testing that the fingerprints from the nested jars are equal to the fingerprints when they are at top-level
// See test/App/Fossa/Container/testdata/nested-jar/README.md for info on how nested_jars.tar was made
#[test]
fn it_finds_nested_jars() {
let nested_jars_millhone_out: String = format!(
r#"
{{
"discovered_jars": {{
"blobs/sha256/3af1c7e331a4b6791c25101e0c862125a597d8d75d786aead62de19f78a5a992": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/deepest.jar",
"fingerprints": {{
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM="
}}
}}
],
"blobs/sha256/5ee98bff2cf0e70d115677fc37f734d26848435eef5fe52e905229ff7a7d87fb": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/middle.jar",
"fingerprints": {{
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/middle.jar{separator}deepest.jar",
"fingerprints": {{
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
}}
}}
],
"blobs/sha256/6979b741102e5c5c787f94ad8bfdebeee561b1b89f21139d38489e1b3d6f9096": [],
"blobs/sha256/931c525b52485e01ab5e2926a4b3c884f1c7325782dca13bd11e345f46cc34c3": [],
"blobs/sha256/10bb0e91eb016af401369ecaadccfea9f4768776e54d46ad4e9a0309c82f1d7f": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar",
"fingerprints": {{
"v1.raw.jar": "TNW7ezd3fqw3MULVTrexg68Q1x2PTDGk2DkltAqUefk=",
"v1.mavencentral.jar": "TtwsgEXwLd/8UFTohsFhJqYMJ74=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"sha_256": "l9XTA5PwWJhnFlz9t0SWKvr2cHDmcytIVvPsr6vqFis="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar{separator}middle.jar",
"fingerprints": {{
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI=",
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar{separator}middle.jar{separator}deepest.jar",
"fingerprints": {{
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
}}
}}
]
}}
}}
"#,
separator = std::path::MAIN_SEPARATOR_STR.replace("\\", "\\\\")
);
let image_tar_file =
PathBuf::from("../../test/App/Fossa/Container/testdata/nested_jars.tar");
let res = jars_in_container(&image_tar_file)
.expect("Read jars out of container image.")
.pipe(serde_json::to_value)
.expect("encode as json");
let expected: Value =
serde_json::from_str(&nested_jars_millhone_out).expect("Parse expected json");
pretty_assertions::assert_eq!(expected, res);
}
}
Loading

0 comments on commit e9e8ade

Please sign in to comment.