Skip to content
Kieron Browne edited this page Jun 22, 2021 · 16 revisions

GrootFS

GrootFS is a tool with a command line interface (CLI) that provides filesystem isolation for containers. Isolated filesystems are also called root filesystems (or rootfss). Each Garden container references one rootfs that is mounted as its root mountpoint.

GrootFS

CLI

The command line interface (CLI) implements the Garden's image plugin binary interface and is used as an image plugin by Garden's volumizer. The CLI consists of the following commands:

  • InitStore: used to create a new grootfs store. Required before creating any rootfses.
  • DeleteStore: deletes a store created by InitStore.
  • GenerateVolumeSizeMetadata: TODO
  • Capacity: returns store_size_bytes from the config file
  • Create: creates a new rootfs and returns information (json document) about it on the standard out. Garden parses that json object and uses its data when building the container config.json.
  • Delete: deletes a rootfs, invoked when Garden destroys a container
  • Clean: cleans all unused layers from the cache in the store, invoked when the store size reaches a threshold
  • List: lists all the images in the store
  • Stats: returns store stats

What is a Root Filesystem (rootfs)?

The root filesystem is an overlay filesystem that is mounted under a directory in the GrootFS store. Every rootfs starts with a base image (a tar file, or an OCI image) that consists of layers that are downloaded during rootfs creation. The base image layers are mounted as lower/upper overlay dirs (and are read-only) while the workdir is created by GrootFS on rootfs creation and is writable. Thus containers can only change their writable workdirs but cannot change the base layers.

As base image layers are read-only for containers, they can be shared across different rootfses via simply mounting the upper/lower dirs into different overlay mounts thus optimising disk usage and not downloading layers that are already downloaded. For example, two containers that have their rootfses based on the same base image (such as ubuntu) would have their own rootfses with their own workdirs, but the upper/lower dirs in the overlay mount would be the same. Furthermore, when the second container is created, the layers from the ubuntu base image will not be downloaded (as they have been downloaded when the first container has been created) which also helps for performance.

Note: GrootFS has been designed to be filesystem agnostic, so you can implement it on top of any linux filesystem. Historically the team has been experimenting with aufs, btrfs, overlayfs, xfs and ext4. For more information on the outcomes of those experiments and insight of why we settled on overlayfs on top of xfs, please read this blog post

Command Details

init-store

GrootFS requires a place to create its rootfses. This is achieved by creating a sparse file creating an xfs filesystem on it, and loop mounting it. Using a sparse file means that we can quickly generate a huge filesystem that initially takes very little space on the parent filesystem. Of course this also leads to confusion about how much space is actually used. See understanding grootfs store disk usage for a discussion. Beware that talk of reclaiming space only applies to the xfs filesystem within the sparse file. Once a sparse file expands to accommodate more content, it cannot be shrunk again. To do that you would need to copy the sparse file to a new sparse file, and that is not possible with GrootFS and garden. So if a filesystem containing a GrootFS is full because of the size of the sparse file, there is no easy fix.

When the command grootfs --config <CONFIG_FILE> init-store is run (with store set to an appropriate path in the config file), GrootFS creates the following directory structure. E.g. for store: /tmp/store/unprivileged:

/tmp/store/
├── unprivileged/
│   ├── images/
│   ├── l/
│   ├── locks/
│   ├── meta/
│   │   ├── dependencies/
│   │   └── namespace.json
│   ├── projectids/
│   ├── tmp/
│   ├── volumes/
│   └── whiteout_dev
└── unprivileged.backing-store

The unprivileged.backing-store file is the sparse file containing the xfs filesystem. The xfs filesystem it contains was mounted by GrootFS at /tmp/store/unprivileged.

The subdirectories are used as follows:

  • images: contains a directory per image. Inside there are the workdir and diff directories used by the overlayfs mount, and rootfs where the overlayfs is mounted.
  • l: shortname symlinks to volumes
  • locks: contains lock files
  • meta: contains details about uid/gid mappings, volumes used by images, and volume sizes
  • projectids: used by xfs to manage quotas
  • tmp: temporary storage
  • volumes: where the volumes (image layers) are stored

UID/GID Mapping

Garden uses user namespaces as a security measure, unless using privileged containers. This means all users in the container will be mapped to unprivileged users on the host. Root in particular is mapped to a userid we call maximus, which is the top of the UID range on the system.

The UID/GID mappings are the same for all containers, and the rootfses created by grootfs need to set appropriate file ownership according to these mappings. So if root in the container is mapped to 50000 on the host, a volume in grootfs containing a file that should be owned by root in the container must have uid 50000 set on it. The UID/GID mappings are passed as options to the init-store command and are stored in the meta/namespace.json file where they can be used during the create command.

Existing Backing Stores

It is possible to Init a store in a location where the backing-store file already exists. If it has an xfs filesystem already in it, that will be mounted. If not, the filesystem will be created first.

Direct IO

The GrootFS store is a loop mounted file. Reads and writes to the xfs filesystem inside the store will be cached by the xfs filesystem. Also the backing file's reads and writes will be cached inside its filesystem. Better performance can be obtained by setting the direct-io flag on the loop device used for the backing file mount. This avoids caching on the loop device, saving memory, and stops implicit syncing to the disk.

You can choose to enable direct IO as an argument to the init-store call.

delete-store

The delete store command removes all the images created in the store, and all volumes, then unmounts the backing-store, and deletes the mount path (i.e. store path).

The backing-store file is not deleted, although it possibly should be if you need to reclaim space in it.

create

The create command generates a rootfs in the store, which must have been previously created using init-store.

The command takes a URL for the rootfs image, and an ID for the rootfs. The URL can be either:

  • a filesystem path to a tarball containing the container base image, e.g. /path/to/fs.tar
  • a Docker URL, e.g. docker://cfgarden/strace
  • an OCI URL pointing to an OCI on the local filesystem, e.g. oci:///path/to/oci/file:version

The tarball URL will result in a single volume / layer, whereas the other two type may result in multiple volumes / layers. In all three cases, the layers are stored as tarballs, and grootfs will extract them

Clone this wiki locally