Clarify contentDigest meaning for docker/OCI images #287

glyn · 2019-10-14T08:33:15Z

Let's start with some definitions (based on the "OCI Image Format Specification").

Docker and OCI images have two types of digest: a repo digest and an image id. A repo digest is the SHA-256 digest of the compressed image manifest. Since compression depends on the implementation of the registry used to store the image, the repo digest doesn't logically exist until the image has been pushed. An image id, on the other hand, is the SHA-256 digest of the uncompressed image configuration, which is independent of the registry implementation.

Both these digests are content addresses of an image in the sense that each uniquely identifies the content (modulo SHA-256 collisions). Note that the docker registry spec refers to the repo digest as a "content digest".

The CNAB spec defines the contentDigest fields in bundle.json as follows, firstly for invocation images:

The contentDigest field MUST contain a digest, in OCI format, to be used to compute the integrity of the image. The calculation of how the image matches the contentDigest is dependent upon image type. (OCI, for example, uses a Merkle tree while VM images are checksums). During bundle development, it may be ideal to omit the contentDigest field and/or skip validation. Once a bundle is ready to be transmitted as a thick or thin bundle, it must have a contentDigest field. If a contentDigest field is present, a runtime MUST validate the image digest prior to executing an action. If the contentDigest is not present, the runtime SHOULD report an error so the user is aware that there is no contentDigest provided. Runtimes MAY allow users to override this behavior and perform actions on bundles that do not have contentDigest values populated.

and then for images other than invocation images:

contentDigest: MUST contain a digest of the contents of the image, in OCI format, to be used to compute the integrity of the image. The calculation of how the image matches the contentDigest is dependent upon image type. (OCI, for example, uses a Merkle tree while VM images use checksums.)

Since both repo digests and image ids are roots of Merkle trees, the CNAB spec doesn't actually prescribe whether repo digest or image id (or indeed some other Merkle tree root digest!) should be used for contentDigest fields of docker/OCI images. This needs clarifying so that CNAB runtimes know how to validate these fields.

The text was updated successfully, but these errors were encountered:

trishankatdatadog · 2020-01-08T18:44:49Z

Great question! I think image id is the better idea, since it's registry-independent... @jlegrone

jeremyrickard · 2020-01-17T16:23:18Z

I think we discussed this way back in 2018 in the early days and this was a common view. I think we ended up going with the assumption it was the repo digest, I’ll see if I can find an old issue in the Duffle repo!

I think the image id is attractive since it has no registry requirement.

jeremyrickard · 2020-01-28T03:40:09Z

Some of this history was on Docker’s slack I think, so that’s not all going to be recoverable (unless Docker can get it), some related comments and discussion:

cnabio/duffle#691 (comment)

#61 (comment)

A note I had from someone at docker (dcmg)

“
containerd uses the digest, not the image id to refer to images. If you pull with containerd, the image digest used is the manifest hash. Containerd itself does no create content, so you will always know the digest before pushing. If using buildkit, buildkit can create the content but it will create the full manifest, rather than just image id with uncompressed layers

The manifest digest refers to compressed layers, so Docker doesn't know that identifier until after push since it calculates it on push. After we replace the image backend in Docker, that will work a little differently, we will be able to keep the compressed image hashes that were pulled or built

Related to multiple identifiers, it is always possible to create images that are the "same" but only differ by metadata, compression, encryption, or anything else

However, we are trying to move to a world where that original content is always used, so changes to the identifier actually represent a change to the image, rather than a side effect of pulling and pushing an image from a different docker version

The image ID does not have a the compressed hash, which tends to be what is needed to fetch the image from a repository or the byte size of the fetch-able artifacts”

technosophos · 2020-02-12T23:37:47Z

I don't think you can reasonably call a hashed file "a root of a merkle tree". That assumes an intent that is clearly not there in VM images (namely, that they are tree-structured).

I am not understanding, though, what particular change you are requesting in the spec. Is it a clarification of which SHA Docker considers to be the correct SHA? Or are you proposing an alternative?

glyn · 2020-02-13T07:47:12Z

I don't think you can reasonably call a hashed file "a root of a merkle tree". That assumes an intent that is clearly not there in VM images (namely, that they are tree-structured).

This issue is scoped to docker/OCI images.

I am not understanding, though, what particular change you are requesting in the spec. Is it a clarification of which SHA Docker considers to be the correct SHA? Or are you proposing an alternative?

If I consume a bundle containing a docker/OCI image with contentDigest specified, I need to know whether that's the repo digest or the image id in order to verify it. I'm not asking which one is correct from Docker's perspective: they both have valid uses. It's merely a choice that the CNAB spec. has to make.

Let's take a simple example to make this crystal clear. CNAB runtime A could assume the contentDigest of a docker/OCI image is its repo digest while CNAB runtime B could assume its the image id. If a bundle created by runtime A was consumed by runtime B, then runtime B could say the contentDigest was invalid because it wasn't what runtime B was expecting.

vdice · 2020-09-18T16:29:26Z

With #384 merged, I believe we can consider this issue closed.

glyn · 2020-09-21T08:51:10Z

LGTM, thanks.

glyn mentioned this issue Oct 14, 2019

Bug in image reference handling in Docker Driver cnabio/cnab-go#145

Open

glyn mentioned this issue Dec 16, 2019

Reference images by digest in docker driver cnabio/cnab-go#166

Closed

vdice mentioned this issue Aug 20, 2020

docs(101-bundle-json.md): add clarification around contentDigest value #384

Merged

vdice mentioned this issue Sep 17, 2020

Updates in anticipation of cnab-core-1.1.0 #388

Merged

vdice closed this as completed Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify contentDigest meaning for docker/OCI images #287

Clarify contentDigest meaning for docker/OCI images #287

glyn commented Oct 14, 2019

trishankatdatadog commented Jan 8, 2020

jeremyrickard commented Jan 17, 2020

jeremyrickard commented Jan 28, 2020 •

edited

Loading

technosophos commented Feb 12, 2020

glyn commented Feb 13, 2020

vdice commented Sep 18, 2020

glyn commented Sep 21, 2020

Clarify contentDigest meaning for docker/OCI images #287

Clarify contentDigest meaning for docker/OCI images #287

Comments

glyn commented Oct 14, 2019

trishankatdatadog commented Jan 8, 2020

jeremyrickard commented Jan 17, 2020

jeremyrickard commented Jan 28, 2020 • edited Loading

technosophos commented Feb 12, 2020

glyn commented Feb 13, 2020

vdice commented Sep 18, 2020

glyn commented Sep 21, 2020

jeremyrickard commented Jan 28, 2020 •

edited

Loading