You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With peer pods, we currently have no way to share image layers downloaded on the worker node. Consequently, we only support the in-guest image pull approach. This creates some challenges regarding container rootfs storage inside the guest. The challenge has increased manifold with the size of AI runtime and model images running into multiple GBs (10+).
Let's understand how we handle container rootfs storage inside the guest.
For packer-created guest (pod VM) images, the container rootfs storage is the guest's root disk. We mount the /kata-containers folder on the guest's root disk to /run/kata-containers/image during boot.
The container image layers are downloaded and extracted under /run/kata-containers/image/layers. The overlay (rw) layer is under /run/kata-containers/image/overlay
If using an unencrypted root disk, this approach violates the CoCo threat model, as a privileged infra admin can easily tamper with the root disk contents or read the contents of the container image (which you don't want when using an encrypted container image).
For mkosi-created guest images, the guest root disk is read-only and dm-verity protected. The container rootfs storage is the guest memory. Using memory for container rootfs is safest as the memory is encrypted. However this approach introduces a new problem. Guest memory is limited, and downloading and extracting huge container images in memory may not be the best use of the available memory. Also, there is no easy way to automatically determine the guest memory requirement based on the container images used in the pod specs.
So how can we improve the container rootfs storage handling in peer-pods? This issue is to share some ideas and kickstart the discussion.
Proposal-1
Create a LUKS encrypted guest root disk partition with a dynamically generated key during boot
Use the LUKS encrypted guest root disk partition as the container rootfs store
With peer pods, we currently have no way to share image layers downloaded on the worker node. Consequently, we only support the in-guest image pull approach. This creates some challenges regarding container rootfs storage inside the guest. The challenge has increased manifold with the size of AI runtime and model images running into multiple GBs (10+).
Let's understand how we handle container rootfs storage inside the guest.
For packer-created guest (pod VM) images, the container rootfs storage is the guest's root disk. We mount the
/kata-containers
folder on the guest's root disk to/run/kata-containers/image
during boot.The container image layers are downloaded and extracted under
/run/kata-containers/image/layers
. The overlay (rw) layer is under/run/kata-containers/image/overlay
If using an unencrypted root disk, this approach violates the CoCo threat model, as a privileged infra admin can easily tamper with the root disk contents or read the contents of the container image (which you don't want when using an encrypted container image).
For mkosi-created guest images, the guest root disk is read-only and dm-verity protected. The container rootfs storage is the guest memory. Using memory for container rootfs is safest as the memory is encrypted. However this approach introduces a new problem. Guest memory is limited, and downloading and extracting huge container images in memory may not be the best use of the available memory. Also, there is no easy way to automatically determine the guest memory requirement based on the container images used in the pod specs.
So how can we improve the container rootfs storage handling in peer-pods? This issue is to share some ideas and kickstart the discussion.
Proposal-1
Recently, CDH introduced support for encrypted (ephemeral) block devices. We can explore whether it will be feasible to reuse this: https://github.com/ChengyuZhu6/guest-components/blob/main/confidential-data-hub/docs/use-cases/secure-mount-with-block-device.md.
Proposal-2
overlay
layer as a tmpfs volume during boot so any writes are on the memory.cc @mkulke @stevenhorsman @genjuro214 @huoqifeng @snir911
The text was updated successfully, but these errors were encountered: