PR #1721: ch-image.rst: update build cache comparison

hpc · Sep 12, 2023 · d98912f · d98912f
1 parent f1fa4df
commit d98912f
Showing 1 changed file with 67 additions and 20 deletions.
diff --git a/doc/ch-image.rst b/doc/ch-image.rst
@@ -67,12 +67,14 @@ Common options placed before or after the sub-command:
     exception is :code:`push`, which implies :code:`--auth`.
 
   :code:`--cache`
-    Enable build cache. Default if a sufficiently new Git is available.
+    Enable build cache. Default if a sufficiently new Git is available. See
+    section :ref:`Build cache <ch-image_build-cache>` for details.
 
   :code:`--cache-large SIZE`
     Set the cache’s large file threshold to :code:`SIZE` MiB, or :code:`0` for
     no large files, which is the default. This can speed up some builds.
-    **Experimental.** See section "Build cache" for details.
+    **Experimental.** See section :ref:`Large file threshold
+    <ch-image_bu-large>` for details.
 
   :code:`--no-cache`
     Disable build cache. Default if a sufficiently new Git is not available.
@@ -277,6 +279,8 @@ storage directory with :code:`ch-image reset`.
    will likely have strong opinions.
 
 
+.. _ch-image_build-cache:
+
 Build cache
 ===========
 
@@ -304,7 +308,7 @@ restored by the cache by default. Notably, extended attributes in privileged
 namespaces (e.g. :code:`trusted`) cannot be read by :code:`ch-image` and will be
 lost without warning.
 
-The cache has three modes, *enabled*, *disabled*, and a hybrid mode called
+The cache has three modes: *enabled*, *disabled*, and a hybrid mode called
 *rebuild* where the cache is fully enabled for :code:`FROM` instructions, but
 all other operations re-execute and re-cache their results. The purpose of
 *rebuild* is to do a clean rebuild of a Dockerfile atop a known-good base
@@ -319,23 +323,64 @@ appropriate Git is installed, otherwise *disabled*.
 Compared to other implementations
 ---------------------------------
 
-Other container implementations typically use build caches based on overlayfs,
-or fuse-overlayfs in unprivileged situations (configured via a "storage
-driver"). This works by creating a new tmpfs for each instruction, layered
-atop the previous instruction’s tmpfs using overlayfs. Each layer can then be
-tarred up separately to form a tar-based diff.
-
-The Git-based cache has two advantages over the overlayfs approach. First,
-kernel-mode overlayfs is only available unprivileged in Linux 5.11 and higher,
-forcing the use of fuse-overlayfs and its accompanying FUSE overhead for
-unprivileged use cases. Second, Git de-duplicates and compresses files in a
-fairly sophisticated way across the entire build cache, not just between image
-states with an ancestry relationship (detailed in the next section).
-
-A disadvantage is lowered performance in some cases. Preliminary experiments
-suggest this performance penalty is relatively modest, and sometimes
-Charliecloud is actually faster than alternatives. We have ongoing experiments
-to answer this performance question in more detail.
+.. note::
+
+   This section is a lightly edited excerpt from our paper “`Charliecloud’s
+   layer-free, Git-based container build cache
+   <https://arxiv.org/abs/2309.00166>`_”.
+
+Existing tools such as Docker and Podman implement their build cache with a
+layered (union) filesystem such as `OverlayFS
+<https://github.com/torvalds/linux/blob/af5f239/Documentation/filesystems/overlayfs.rst>`_
+or `FUSE-OverlayFS <https://github.com/containers/fuse-overlayfs/tree/v1.12>`_
+and tar archives to represent the content of each layer; this approach is
+`standardized by OCI
+<https://github.com/opencontainers/image-spec/blob/63b8bd0/spec.md>`_. The
+layered cache works, but it has drawbacks in three critical areas:
+
+1. **Diff format.** The tar format is poorly standardized and `not designed
+   for diffs <https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar>`_.
+   Notably, tar cannot represent file deletion. The workaround used for OCI
+   layers is specially named *whiteout* files, which means the tar archives
+   cannot be unpacked by standard UNIX tools and require special
+   container-specific processing.
+
+2. **Cache overhead.** Each time a Dockerfile instruction is started, a new
+   overlay filesystem is mounted atop the existing layer stack. File metadata
+   operations in the instruction then start at the top layer and descend the
+   stack until the layer containing the desired file is reached. The cost of
+   these operations is therefore proportional to the number of layers, i.e.,
+   the number of instructions between the empty root image and the instruction
+   being executed. This results in a `best practice
+   <https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
+   of large, complex instructions to minimize their number, which can conflict
+   with simpler, more numerous instructions the user might prefer.
+
+3. **De-duplication.** Identical files on layers with an ancestry relationship
+   (i.e., instruction *A* precedes *B* in a build) are stored only once.
+   However, identical files on layers without this relationship are stored
+   multiple times. For example, if instructions *B* and *B'* both follow *A* —
+   perhaps because *B* was modified and the image rebuilt — then any files
+   created by both *B* and *B'* will be stored twice.
+
+   Also, similar files are never de-duplicated, regardless of ancestry. For
+   example, if instruction *A* creates a file and subsequently instruction *B*
+   modifies a single bit in that file, both versions are stored in their
+   entirety.
+
+Our Git-based cache addresses the three drawbacks: (1) Git is purpose-built to
+store changing directory trees, (2) cache overhead is imposed only at
+instruction commit time, and (3) Git de-duplicates both identical and similar
+files. Also, it is based on an extremely widely used tool that enjoys development
+support from well-resourced actors, in particular on scaling (e.g.,
+Microsoft’s large-repository accelerator `Scalar
+<https://devblogs.microsoft.com/devops/introducing-scalar/>`_ was recently
+`merged into Git
+<https://github.blog/2022-10-03-highlights-from-git-2-38/>`_).
+
+In addition to these structural advantages, performance experiments reported in our paper above show that the Git-based approach is as good as (and sometimes better than) overlay-based caches. On build time, the two approaches are broadly similar, with one or the other being faster depending on context. Both had performance problems on NFS. Notably, however, the Git-based cache was much faster for a 129-instruction Dockerfile. On disk usage, the winner depended on the condition. For example, we saw the layered cache storing large sibling layers redundantly; on the other hand, the Git-based cache has some obvious redundancies as well, and one must compact it for full de-duplication benefit. However, Git’s de-duplication was quite effective in some conditions and we suspect will prove even better in more realistic scenarios.
+
+That is, we believe our results show that the Git-based build cache is highly competitive with the layered approach, with no obvious inferiority so far and hints that it may be superior on important dimensions. We have ongoing work to explore these questions in more detail.
 
 De-duplication and garbage collection
 -------------------------------------
@@ -389,6 +434,8 @@ files. In both cases, garbage uses all available cores.
 :code:`git build-cache` prints the specific garbage collection parameters in
 use, and :code:`-v` can be added for more detail.
 
+.. _ch-image_bu-large:
+
 Large file threshold
 --------------------