Skip to content
This repository has been archived by the owner on Oct 2, 2024. It is now read-only.

Commit

Permalink
PR #1721: ch-image.rst: update build cache comparison
Browse files Browse the repository at this point in the history
  • Loading branch information
reidpr authored Sep 12, 2023
1 parent f1fa4df commit d98912f
Showing 1 changed file with 67 additions and 20 deletions.
87 changes: 67 additions & 20 deletions doc/ch-image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,14 @@ Common options placed before or after the sub-command:
exception is :code:`push`, which implies :code:`--auth`.

:code:`--cache`
Enable build cache. Default if a sufficiently new Git is available.
Enable build cache. Default if a sufficiently new Git is available. See
section :ref:`Build cache <ch-image_build-cache>` for details.

:code:`--cache-large SIZE`
Set the cache’s large file threshold to :code:`SIZE` MiB, or :code:`0` for
no large files, which is the default. This can speed up some builds.
**Experimental.** See section "Build cache" for details.
**Experimental.** See section :ref:`Large file threshold
<ch-image_bu-large>` for details.

:code:`--no-cache`
Disable build cache. Default if a sufficiently new Git is not available.
Expand Down Expand Up @@ -277,6 +279,8 @@ storage directory with :code:`ch-image reset`.
will likely have strong opinions.


.. _ch-image_build-cache:

Build cache
===========

Expand Down Expand Up @@ -304,7 +308,7 @@ restored by the cache by default. Notably, extended attributes in privileged
namespaces (e.g. :code:`trusted`) cannot be read by :code:`ch-image` and will be
lost without warning.

The cache has three modes, *enabled*, *disabled*, and a hybrid mode called
The cache has three modes: *enabled*, *disabled*, and a hybrid mode called
*rebuild* where the cache is fully enabled for :code:`FROM` instructions, but
all other operations re-execute and re-cache their results. The purpose of
*rebuild* is to do a clean rebuild of a Dockerfile atop a known-good base
Expand All @@ -319,23 +323,64 @@ appropriate Git is installed, otherwise *disabled*.
Compared to other implementations
---------------------------------

Other container implementations typically use build caches based on overlayfs,
or fuse-overlayfs in unprivileged situations (configured via a "storage
driver"). This works by creating a new tmpfs for each instruction, layered
atop the previous instruction’s tmpfs using overlayfs. Each layer can then be
tarred up separately to form a tar-based diff.

The Git-based cache has two advantages over the overlayfs approach. First,
kernel-mode overlayfs is only available unprivileged in Linux 5.11 and higher,
forcing the use of fuse-overlayfs and its accompanying FUSE overhead for
unprivileged use cases. Second, Git de-duplicates and compresses files in a
fairly sophisticated way across the entire build cache, not just between image
states with an ancestry relationship (detailed in the next section).

A disadvantage is lowered performance in some cases. Preliminary experiments
suggest this performance penalty is relatively modest, and sometimes
Charliecloud is actually faster than alternatives. We have ongoing experiments
to answer this performance question in more detail.
.. note::

This section is a lightly edited excerpt from our paper “`Charliecloud’s
layer-free, Git-based container build cache
<https://arxiv.org/abs/2309.00166>`_”.

Existing tools such as Docker and Podman implement their build cache with a
layered (union) filesystem such as `OverlayFS
<https://github.com/torvalds/linux/blob/af5f239/Documentation/filesystems/overlayfs.rst>`_
or `FUSE-OverlayFS <https://github.com/containers/fuse-overlayfs/tree/v1.12>`_
and tar archives to represent the content of each layer; this approach is
`standardized by OCI
<https://github.com/opencontainers/image-spec/blob/63b8bd0/spec.md>`_. The
layered cache works, but it has drawbacks in three critical areas:

1. **Diff format.** The tar format is poorly standardized and `not designed
for diffs <https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar>`_.
Notably, tar cannot represent file deletion. The workaround used for OCI
layers is specially named *whiteout* files, which means the tar archives
cannot be unpacked by standard UNIX tools and require special
container-specific processing.

2. **Cache overhead.** Each time a Dockerfile instruction is started, a new
overlay filesystem is mounted atop the existing layer stack. File metadata
operations in the instruction then start at the top layer and descend the
stack until the layer containing the desired file is reached. The cost of
these operations is therefore proportional to the number of layers, i.e.,
the number of instructions between the empty root image and the instruction
being executed. This results in a `best practice
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
of large, complex instructions to minimize their number, which can conflict
with simpler, more numerous instructions the user might prefer.

3. **De-duplication.** Identical files on layers with an ancestry relationship
(i.e., instruction *A* precedes *B* in a build) are stored only once.
However, identical files on layers without this relationship are stored
multiple times. For example, if instructions *B* and *B'* both follow *A* —
perhaps because *B* was modified and the image rebuilt — then any files
created by both *B* and *B'* will be stored twice.

Also, similar files are never de-duplicated, regardless of ancestry. For
example, if instruction *A* creates a file and subsequently instruction *B*
modifies a single bit in that file, both versions are stored in their
entirety.

Our Git-based cache addresses the three drawbacks: (1) Git is purpose-built to
store changing directory trees, (2) cache overhead is imposed only at
instruction commit time, and (3) Git de-duplicates both identical and similar
files. Also, it is based on an extremely widely used tool that enjoys development
support from well-resourced actors, in particular on scaling (e.g.,
Microsoft’s large-repository accelerator `Scalar
<https://devblogs.microsoft.com/devops/introducing-scalar/>`_ was recently
`merged into Git
<https://github.blog/2022-10-03-highlights-from-git-2-38/>`_).

In addition to these structural advantages, performance experiments reported in our paper above show that the Git-based approach is as good as (and sometimes better than) overlay-based caches. On build time, the two approaches are broadly similar, with one or the other being faster depending on context. Both had performance problems on NFS. Notably, however, the Git-based cache was much faster for a 129-instruction Dockerfile. On disk usage, the winner depended on the condition. For example, we saw the layered cache storing large sibling layers redundantly; on the other hand, the Git-based cache has some obvious redundancies as well, and one must compact it for full de-duplication benefit. However, Git’s de-duplication was quite effective in some conditions and we suspect will prove even better in more realistic scenarios.

That is, we believe our results show that the Git-based build cache is highly competitive with the layered approach, with no obvious inferiority so far and hints that it may be superior on important dimensions. We have ongoing work to explore these questions in more detail.

De-duplication and garbage collection
-------------------------------------
Expand Down Expand Up @@ -389,6 +434,8 @@ files. In both cases, garbage uses all available cores.
:code:`git build-cache` prints the specific garbage collection parameters in
use, and :code:`-v` can be added for more detail.

.. _ch-image_bu-large:

Large file threshold
--------------------

Expand Down

0 comments on commit d98912f

Please sign in to comment.