Skip to content

Latest commit

 

History

History
109 lines (86 loc) · 4.54 KB

0009-exclude-all-files-from-dockerignore-by-default.md

File metadata and controls

109 lines (86 loc) · 4.54 KB

Table of Contents generated with DocToc

9. Exclude all files from dockerignore by default

Date: 2022-02-13

Status

Accepted

Context

Building Docker container always starts with sending the build context first - depending on a number of files in the context. The context might have even many hundreds of MB, which - depending on where your docker builder is and how fast your system is - might lead to even tens of seconds of delays before the docker build command is run and the actual build starts.

The context has to be compressed, sent, decompressed, so it takes CPU, networking and I/O.

Airflow - unfortunately has Dockerfiles, and sources directly in the top-level of the project. There is no "src" folder and by default the docker commands use the folder where the "Dockerfile" is placed and the context files cannot be taken from outside the context. Thus - Dockerfiles have to be put at the "top-level" of the airflow project.

By default, all files in the current context should be sent as context unless you ignore them via .dockerfile - in a concept that is similar to .gitignore. Airflow has many files that are huge and generated during running and building (for example node_modules) but also .egginfo and plenty of other files in many folders.

It sounds like a reasonable approach to do to ignore specific folders, however it has one drawback. You might simply not realise that some newly generated files have been added and increase the context - thus increase the overhead needed to build the docker images. There is no way to prevent or check such accidental additions - for example when refactoring files, or adding new functionalities.

Decision

There are a number of strategies that can address the problem, ranging by convention and automated checks but in our case, we have multiple independent contributors and committers reviewing the code, such a change might easily slip-through. So the solution should be "self-managing". Luckily, there is a way that has been discussed in a number of places but notably here

The strategy involves ignoring all files by default and only selectively excluding certain folders and patterns that should be allowed to be part of the context.

The .dockerignore file has appropriate functionality

This is what we decided to use for our Dockerfile and Dockerfile.ci.

Consequences

There are two consequences of this decision:

  • whenever new files (that do not follow the "approved" patterns will be added to the airflow repository, they will not increase the size of the context
  • we have to still regularly monitor the context to see whether the approved patterns did not - by accident approved some unnecessary files, but that should be a rather rare event
  • whenever someone wants to add something to our container images, and it is not a part of already "approved" patterns, the file will be missing during the build, which might lead to a little surprise, but it is explained in the .dockerignore what to do in this case, and .dockerignore is the place where you would look for a problem anyway. The users should be guided to add new pattern to the .dockerignore in this case.