-
Notifications
You must be signed in to change notification settings - Fork 79
GrootFS
GrootFS is a tool with a command line interface (CLI) that provides filesystem
isolation for containers. Isolated filesystems are also called root filesystems
(or rootfs
s).
Each Garden container references one rootfs that is mounted as its root mountpoint.
The command line interface (CLI) implements the Garden's image plugin binary interface and is used as an image plugin by Garden's volumizer. The CLI consists of the following commands:
-
InitStore
: used to create a new grootfs store. Required before creating any rootfses. -
DeleteStore
: deletes a store created by InitStore. -
GenerateVolumeSizeMetadata
: TODO -
Capacity
: returnsstore_size_bytes
from the config file -
Create
: creates a new rootfs and returns information (json document) about it on the standard out. Garden parses that json object and uses its data when building the containerconfig.json
. -
Delete
: deletes a rootfs, invoked when Garden destroys a container -
Clean
: cleans all unused layers from the cache in the store, invoked when the store size reaches a threshold -
List
: lists all the images in the store -
Stats
: returns store stats
The root filesystem is an
overlay filesystem
that is mounted under a directory in the GrootFS store. Every rootfs starts
with a base image (a tar file, or an
OCI image) that
consists of layers that are downloaded during rootfs creation. The base image
layers are mounted as lower/upper overlay dirs (and are read-only) while the
workdir
is created by GrootFS on rootfs creation and is writable. Thus
containers can only change their writable workdir
s but cannot change the base
layers.
As base image layers are read-only for containers, they can be shared across
different rootfses via simply mounting the upper/lower dirs into different
overlay mounts thus optimising disk usage and not downloading layers that are
already downloaded. For example, two containers that have their rootfses based
on the same base image (such as ubuntu
) would have their own rootfses with
their own workdir
s, but the upper/lower dirs in the overlay mount would be
the same. Furthermore, when the second container is created, the layers from
the ubuntu
base image will not be downloaded (as they have been downloaded
when the first container has been created) which also helps for performance.
Note: GrootFS has been designed to be filesystem agnostic, so you can implement it on top of any linux filesystem. Historically the team has been experimenting with aufs, btrfs, overlayfs, xfs and ext4. For more information on the outcomes of those experiments and insight of why we settled on overlayfs on top of xfs, please read this blog post
GrootFS requires a place to create its rootfses. This is achieved by creating a sparse file creating an xfs filesystem on it, and loop mounting it. Using a sparse file means that we can quickly generate a huge filesystem that initially takes very little space on the parent filesystem. Of course this also leads to confusion about how much space is actually used. See understanding grootfs store disk usage for a discussion. Beware that talk of reclaiming space only applies to the xfs filesystem within the sparse file. Once a sparse file expands to accommodate more content, it cannot be shrunk again. To do that you would need to copy the sparse file to a new sparse file, and that is not possible with GrootFS and garden. So if a filesystem containing a GrootFS is full because of the size of the sparse file, there is no easy fix.
When the command grootfs --config <CONFIG_FILE> init-store
is run (with store set to an appropriate path in the config file),
GrootFS creates the following directory structure. E.g. for store: /tmp/store/unprivileged
:
/tmp/store/
├── unprivileged/
│ ├── images/
│ ├── l/
│ ├── locks/
│ ├── meta/
│ │ ├── dependencies/
│ │ └── namespace.json
│ ├── projectids/
│ ├── tmp/
│ ├── volumes/
│ └── whiteout_dev
└── unprivileged.backing-store
The unprivileged.backing-store
file is the sparse file containing the xfs filesystem.
The xfs filesystem it contains was mounted by GrootFS at /tmp/store/unprivileged
.
The subdirectories are used as follows:
-
images
: contains a directory per image. Inside there are theworkdir
anddiff
directories used by the overlayfs mount, androotfs
where the overlayfs is mounted. -
l
: shortname symlinks to volumes -
locks
: contains lock files -
meta
: contains details about uid/gid mappings, volumes used by images, and volume sizes -
projectids
: used by xfs to manage quotas -
tmp
: temporary storage -
volumes
: where the volumes (image layers) are stored
Garden uses user namespaces as a security measure, unless using privileged containers. This means all users in the container will be mapped to unprivileged users on the host. Root in particular is mapped to a userid we call maximus, which is the top of the UID range on the system.
The UID/GID mappings are the same for all containers, and the rootfses created by grootfs need to set appropriate file ownership according to these mappings.
So if root in the container is mapped to 50000 on the host, a volume in grootfs containing a file that should be owned by root in the container must have uid 50000 set on it.
The UID/GID mappings are passed as options to the init-store command and are stored in the meta/namespace.json
file
where they can be used during the create
command.
It is possible to Init a store in a location where the backing-store file already exists. If it has an xfs filesystem already in it, that will be mounted. If not, the filesystem will be created first.
The GrootFS store is a loop mounted file. Reads and writes to the xfs filesystem inside the store will be cached by the xfs filesystem. Also the backing file's reads and writes will be cached inside its filesystem. Better performance can be obtained by setting the direct-io flag on the loop device used for the backing file mount. This avoids caching on the loop device, saving memory, and stops implicit syncing to the disk.
You can choose to enable direct IO as an argument to the init-store call.
The delete store command removes all the images created in the store, and all volumes, then unmounts the backing-store, and deletes the mount path (i.e. store path).
The backing-store file is not deleted, although it possibly should be if you need to reclaim space in it.
The create command generates a rootfs in the store, which must have been previously created using init-store.
The command takes a URL for the rootfs image, and an ID for the rootfs. The URL can be either:
- a filesystem path to a tarball containing the container base image, e.g.
/path/to/fs.tar
- a Docker URL, e.g.
docker://cfgarden/strace
- an OCI URL pointing to an OCI on the local filesystem, e.g.
oci:///path/to/oci/file:version
The tarball URL will result in a single volume / layer, whereas the other two type may result in multiple volumes / layers. In all three cases, the layers are stored as tarballs, and grootfs will extract them