-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a way to verify body sha256 of small files on SplitStreamReader #14
Comments
This kinda-sorta got slightly addressed in 9c5fde5 with |
I'm reusing this issue that's about splitstream to ask something tangentially related but much bigger picture. I don't quite understand the "centrality" of splitstream for this project. I think what we should have as the "root" is composefs metadata blobs that we both read and write. In particular, the In an ideal world, we have "canonical tar" and for "verified OCI images" then there is no splitstream - we can reliably reproduce a canonical tar from a composefs. |
(The reason I bring this up is I was just replying something similar in containers/storage#2155 (comment) ) |
We talked about this before, but got caught up on two issues:
|
Yes, I agree. However, I think where we disagree is on whether it should be the canonical on-disk form for an OCI layer (or equivalent) or whether it should be an optional lookaside only consulted if the use case needs to reassemble the original tar/cpio/whatever. In particular, for "verified OCI" I think we do want to put the canonical composefs digest of each layer in the manifest as annotations. Under our working assumption that the manifest or configuration can be verified externally (e.g. via well known signature mechanisms) and if we represent OCI layers (or RPMs, whatever) as individual composefs on disk, then we have the ability to check the fsverity digest of those layers in addition to the final merged trees. Of course we could try to put the splitstream digest into OCI and sign that, but that's making it load bearing for security. Don't get me wrong, I think splitstream is a nicely designed format - but I think the technological heart of what we do should be composefs. |
We have a choice, but this is what I think we should do - the composefs should represent the tar stream. |
Splitstream is not currently directly involved in the security aspects of anything that we might do with a composefs container, but: A lot of the repository APIs take an optional fs-verity digest parameter. If this is known then we take this to be the verity digest of the splitstream object in question. If we know this then we trust the references that are inside of that splitstream, which lets us recursively open the streams it refers to (ie: config refers to layers, layers refer to content files) without further checking. This works because the splitstream records the fs-verity digest of the referred objects in its header. If we don't do this then we are forced to check everything for on-disk consistency to ensure that nothing got corrupted. This is done as a manual scan (the above-mentioned This is all only relevant when performing transformation operations on the image, such as when "sealing", ie: creating the composefs for the first time for a new container image. In this case we absolutely do need to verify that the layers actually contain the sha256 data that they say they do in order to avoid a situation where they were corrupted or tampered on disk. If we have the fs-verity digest of the splitstream of the container config then we don't need to do that checking because we effectively already did (when doing the original pull). We also have the option to play fast and loose with the data involved during the "create image" stage, assuming that our builders are immune to these kinds of issues, and only defend against tampering/corruption once on end user machines, but since we have this powerful data integrity technique at our disposal, why not use it in both places? |
Yes that's a fair point. However I would also say that sealing is basically a form of "push" which is certainly a feature we need in the general case, but I would still consider optional - for an "appliance system" it's not going to re-push its containers, so we don't need splitstream there. The client system should just be reading verified composefs.
Yes but that's my point - how is it known? I don't see how it's "known" unless it's rooted in something we agree is a signing or verification target. I would like to investigate having something like a "splitstream-from-composefs" that is just the delta from a composefs. For example I bet in many cases that splitstream could literally be just a single byte per entry that says "create a ustar or gnu formatted entry from the metadata we find in composefs". Or really even in many cases that'd just be "the input tar was ustar" hopefully. And in the limit of course with canonical tar, the splitstream is empty. (Again though I think we're agreed that we can't rely on canonical tar, but it's something to aspire to) |
It is known in exactly one case: if you saved it from a previous operation (like a pull, or a previous container-mutation operation). This is really only useful to the local user. We might talk about adding it as an extra form of digest in the manifests or something but this feels very premature. I'd definitely be against doing that right now. |
Yes, we can have local state that at one point we proved that a given tar-sha256 ("diffid") was equivalent to a given splitstream with a specific fsverity digest. I don't have any problem with that inherently.
I am talking about adding the composefs digests (not splitstream) in manifests which you are carefully not replying to 😉 But it's OK for now, we can just think about this issue and don't need to take any immediate action. In the immediate short term I'd like actually to pick back up the containers/composefs#294 issue and get a "spec" that just targets what we agreed on in "composefs as label in config for final merged tree" and documenting our reference implementation. We can leave standardizations of the layer digests to an updated draft later. |
We currently read the container config JSON without any sort of validation of the content. We can know the fsverity hash of the stream from when we create it, but assuming we don't make note of that, then when we go to open the config via its sha256 ID, we perform no checking at all.
We could fix this by adding an optional sha256 verification layer to the reader similar to the one we have in the writer. We could also read the entire (small) json document into a vec and checksum it there. We could also try to think about a reasonable way to make use of fsverity, but this seems to cut a little bit against how containers work today...
Solving this is important to the "mount verified container rootfs" goal.
The text was updated successfully, but these errors were encountered: