From 48d5a34219b316ac0c4c0360a1119f47526331ba Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 25 Aug 2023 17:27:06 -0600 Subject: [PATCH 01/20] RSYNC extension --- doc/ch-image.rst | 146 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 134 insertions(+), 12 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 60477b37c..e7c18437d 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -506,10 +506,12 @@ Synopsis Description ----------- -Uses :code:`ch-run -w -u0 -g0 --no-passwd --unsafe` to execute :code:`RUN` -instructions. Note that :code:`FROM` implicitly pulls the base image if -needed, so you may want to read about the :code:`pull` subcommand below as -well. +See below for differences with other Dockerfile interpreters. Charliecloud +supports an extended instruction (:code:`RSYNC`), a few other instructions +behave slightly differently, and a few are ignored. + +Note that :code:`FROM` implicitly pulls the base image if needed, so you may +want to read about the :code:`pull` subcommand below as well. Required argument: @@ -590,6 +592,9 @@ Options: If no colon present in the name, append :code:`:latest`. +Uses :code:`ch-run -w -u0 -g0 --no-passwd --unsafe` to execute :code:`RUN` +instructions. + Privilege model --------------- @@ -733,8 +738,8 @@ filter and has no knowledge of which instructions actually used the intercepted system calls. Therefore, the printed “instructions modified” number is only a count of instructions with a hook applied as described above. -Compatibility with other Dockerfile interpreters ------------------------------------------------- +Compatibility and behavior differences +-------------------------------------- :code:`ch-image` is an independent implementation and shares no code with other Dockerfile interpreters. It uses a formal Dockerfile parsing grammar @@ -769,8 +774,9 @@ Context directory The context directory is bind-mounted into the build, rather than copied like Docker. Thus, the size of the context is immaterial, and the build reads -directly from storage like any other local process would. However, you still -can’t access anything outside the context directory. +directly from storage like any other local process would (i.e., it is +reasonable use :code:`/` for the context). However, you still can’t +access anything outside the context directory. Variable substitution ~~~~~~~~~~~~~~~~~~~~~ @@ -917,8 +923,8 @@ Features we do not plan to support unprivileged processes. * :code:`HEALTHCHECK`: This instruction’s main use case is monitoring server - processes rather than applications. Also, implementing it requires a - container supervisor daemon, which we have no plans to add. + processes rather than applications. Also, it requires a container supervisor + daemon, which we have no plans to add. * :code:`MAINTAINER` is deprecated. @@ -927,11 +933,126 @@ Features we do not plan to support * :code:`USER` does not make sense for unprivileged builds. -* :code:`VOLUME`: This instruction is not currently supported. Charliecloud +* :code:`VOLUME`: Charliecloud has good support for bind mounts; we anticipate that it will continue to focus on that and will not introduce the volume management features that Docker has. +New instructions +---------------- + +:code:`RSYNC` +~~~~~~~~~~~~~ + +Copying files is often simple but has numerous difficult corner cases, e.g. +when dealing with symbolic or hard links. The standard instruction +:code:`COPY` deals with many of these corner cases differently from other UNIX +utilities, lacks documentation, and behaves inconsistently between different +Dockerfile interpreters (e.g., Docker’s legacy builder vs. BuildKit), as +detailed above. On the other hand, :code:`rsync(1)` is an extremely capable, +widely used file copy tool, with detailed options to specify behavior and 25 +years of history dealing with copy weirdness. + +:code:`RSYNC` (also spelled :code:`NSYNC`) is a Charliecloud extension that +gives copying behavior identical to :code:`rsync(1)`, including remote +:code:`ssh:` and :code:`rsync:` transports. (In fact, Charliecloud’s current +literally calls the host’s :code:`rsync(1)` to perform the copy, though this +may change in the future.) + +:code:`RSYNC` takes the same arguments as :code:`rsync(1)`, so refer to its +`man page `_ for a +detailed explanation of all the options. Source arguments that are remote +(i.e., starting with :code:`ssh:` or :code:`rsync:`) are unchanged, while +local sources must be relative paths and are relative to the context +directory. Any globbed sources are processed by :code:`ch-image(1)` using +Python rules, i.e., :code:`rsync(1)` sees no wildcard sources. Destinations +must be local. Relative destinations are relative to the image’s current +working directory, while absolute destinations refer to the image’s root. For +example, + +.. code-block:: docker + + WORKDIR /foo + RSYNC --foo ssh:src1 ./src2 ./dst1 + RSYNC ./src3 /dst2 + +behaves as the following under the hood:: + + $ rsync --foo ssh:src1 /context/src2 /storage/imgroot/foo/dst2 + $ rsync /context/src3 /storage/imgroot/dst2 + +Argument :code:`--daemon` is an error, because running :code:`rsync(1)` in +daemon mode does not make sense for container build. For arguments that read +input from a file (e.g. :code:`--exclude-from` or :code:`--files-from`), +relative paths are relative to the context directory, absolute paths refer to +the image root, and :code:`-` (standard input) is an error. + +:code:`RSYNC` allows specifying a single option beginning with :code:`+` +(plus) that is shorthand for a group of :code:`rsync(1)` options. **These are +preferred** to :code:`rsync(1)`’s own :code:`-a`/:code:`--archive` option +because that may handle symlinks inappropriately for containers and omits some +metadata. This single option is one of: + + :code:`+` (bare plus) + Equivalent to all of: + + * :code:`-@=-1`: use nanosecond precision when comparing timestamps. + * :code:`-A`: preserve ACLs. + * :code:`-H`: preserve hard link groups. + * :code:`-S`: preserve file sparseness when possible. + * :code:`-X`: preserve xattrs in :code:`user.*` namespace. + * :code:`-p`: preserve permissions. + * :code:`-r`: recurse into directories. + + :code:`+L` + Equivalent to the options listed for :code:`+` and also: + + * :code:`-l`: process symlinks in the sources. + + * :code:`--copy-unsafe-links`: copy the target of symlinks that point + outside the top-of-transfer directory for each source. This is *not the + context directory*; see below for examples. + +Symlink handling is particularly fraught, so use care. In particular, you must +state explicitly what to do with symlinks in the source: both plain +:code:`RSYNC` and :code:`RSYNC +` ignore them with a warning. Your options are +:code:`+L` (which is a pretty reasonable default) and any valid combination of +:code:`rsync(1)`’s `normal symlink options +`_. +(:code:`COPY` isn’t any less fraught, and you have no choice about what to do +with symlinks.) + +The instruction is a cache hit if :code:`rsync(1)` would not change anything, +i.e. if :code:`-ni` prepended to the given options lists no changes, and a +miss otherwise. + +.. note:: + + :code:`rsync(1)` supports a configuration file :code:`~/.popt` that alters + its command line processing. Currently, this configuration is respected for + :code:`RSYNC` arguments, but that may change without notice. + +**FIXME**:: + + examples + context directory in full + to new directory + contents to / + something about no + vs + vs +L, trailing slash + absolute inside source dir + points to inside source dir + relative inside source dir + points to inside source dir directly + points to inside source dir with ../sourcedir + points to inside context dir but not source dir + source is symlink + relative + target in same directory + target in different directory + absolute + target in same directory + target in different directory + Examples -------- @@ -1367,4 +1488,5 @@ Environment variables .. LocalWords: tmpfs'es bigvendor AUTH auth bucache buc bigfile df rfc bae .. LocalWords: dlcache graphviz packfile packfiles bigFileThreshold fd Tpdf -.. LocalWords: pstats gprof chofile cffd cacdb ARGs +.. LocalWords: pstats gprof chofile cffd cacdb ARGs NSYNC dst imgroot popt +.. LocalWords: globbed ni From b2eb0a39ea214111a555d18d502b40a2a7d9d144 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Thu, 14 Sep 2023 17:17:32 -0600 Subject: [PATCH 02/20] work on tests a bit [skip ci] --- doc/ch-image.rst | 28 ++++++++-- test/build/50_rsync.bats | 115 +++++++++++++++++++++++++++++++++++++++ test/run/ch-convert.bats | 1 + 3 files changed, 138 insertions(+), 6 deletions(-) create mode 100644 test/build/50_rsync.bats diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 7627fe3cd..24093d81a 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1006,14 +1006,18 @@ New instructions :code:`RSYNC` ~~~~~~~~~~~~~ +.. warning:: + + This instruction is experimental and may change or be removed. + Copying files is often simple but has numerous difficult corner cases, e.g. when dealing with symbolic or hard links. The standard instruction :code:`COPY` deals with many of these corner cases differently from other UNIX -utilities, lacks documentation, and behaves inconsistently between different -Dockerfile interpreters (e.g., Docker’s legacy builder vs. BuildKit), as -detailed above. On the other hand, :code:`rsync(1)` is an extremely capable, -widely used file copy tool, with detailed options to specify behavior and 25 -years of history dealing with copy weirdness. +utilities, lacks complete documentation, and behaves inconsistently between +different Dockerfile interpreters (e.g., Docker’s legacy builder vs. +BuildKit), as detailed above. On the other hand, :code:`rsync(1)` is an +extremely capable, widely used file copy tool, with detailed options to +specify behavior and 25 years of history dealing with weirdness. :code:`RSYNC` (also spelled :code:`NSYNC`) is a Charliecloud extension that gives copying behavior identical to :code:`rsync(1)`, including remote @@ -1038,8 +1042,9 @@ example, RSYNC --foo ssh:src1 ./src2 ./dst1 RSYNC ./src3 /dst2 -behaves as the following under the hood:: +is translated to (the equivalent of):: + $ mkdir -p /foo $ rsync --foo ssh:src1 /context/src2 /storage/imgroot/foo/dst2 $ rsync /context/src3 /storage/imgroot/dst2 @@ -1065,6 +1070,9 @@ metadata. This single option is one of: * :code:`-X`: preserve xattrs in :code:`user.*` namespace. * :code:`-p`: preserve permissions. * :code:`-r`: recurse into directories. + * :code:`--info=progress2` (only if stderr is a terminal): show progress + meter (note `subtleties in interpretation + `_). :code:`+L` Equivalent to the options listed for :code:`+` and also: @@ -1094,6 +1102,12 @@ miss otherwise. its command line processing. Currently, this configuration is respected for :code:`RSYNC` arguments, but that may change without notice. +For example: + +**FIXME** + +See the file :code:`50_rsync.bats` in the source code for more examples, including many corner cases. + **FIXME**:: examples @@ -1114,6 +1128,8 @@ miss otherwise. absolute target in same directory target in different directory + ssh: transport + rsync: transport Examples -------- diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats new file mode 100644 index 000000000..a978a7c9c --- /dev/null +++ b/test/build/50_rsync.bats @@ -0,0 +1,115 @@ +load ../common + +# shellcheck disable=SC2034 +tag=RSYNC + +setup () { + scope standard + [[ $CH_TEST_BUILDER = ch-image ]] || skip 'ch-image only' + fixtures=$BATS_TMPDIR/rsync + context=$fixtures/ctx + context_out=$fixtures/ctx-out + if [[ -e $fixtures ]]; then + echo '## input ##' + ls_ "$fixtures" + fi +} + +ls_ () { + # ls(1)-alike but more predictable output and only the fields we want. See + # also “compare-ls” in ch-convert.bats. + ( + cd "$1" + find . -mindepth 1 -printf '%M %n %3s%y :%d:%f -> %l\n' \ + | sed -E -e 's|:1:||' \ + -e 's|:2:| |' \ + -e 's|:3:| |' \ + -e 's|:4:| |' \ + -e 's| -> $||' \ + -e 's|([0-9]+)[f]|\1|' \ + -e 's|([0-9]+ )[0-9 ]+[a-z] |\1 |' \ + -e "s|$1|/...|" + ) +} + + +@test "${tag}: set up fixtures" { + rm -Rf --one-file-system "$fixtures" + mkdir "$fixtures" + cd "$fixtures" + + # top level + mkdir "$context" + mkdir "$context_out" + ln -s "$context" "$fixtures"/ctx_direct + + # outside context + cd "$context_out" + echo file-out > file-out + mkdir dir-out + echo dir-out.file-out > dir-out/dir-out.file-out + ln -s file-out link.out2out + cd .. + + # inside context + cd "$context" + echo file-ctx > file-ctx + mkdir src + echo src.file > src/src.file + mkdir src/src.dir + echo src.dir.file > src/src.dir/src.dir.file + mkdir not-src + echo not-src.file > not-src.file + + # symlinks to inside source + cd src + # to file + ln -s src.file src.file_direct + ln -s ../src/src.file src.file_up_over + ln -s "$context"/src/src.file src.file_abs + # relative to directory + ln -s src.dir src.dir_direct + ln -s ../src/src.dir src.dir_upover + ln -s "$context"/src/src.dir src.dir_abs + + # symlinks to outside source but inside context + ln -s ../file-ctx file-ctx_up + ln -s "$context"/file-ctx file-ctx_abs + ln -s ../not-src not-src_up + ln -s "$context"/not-src not-src_abs + + # symlinks to outside context + ln -s ../../ctx-out/file-out file-out_rel + ln -s "$context_out"/file-out file-out_abs + ln -s ../../ctx-out/dir-out dir-out_rel + ln -s "$context_out"/dir-out dir-out_abs + + # hard links + echo hard1 > hard1 + ln hard1 hard2 + + # weird permissions + touch weird-perms-607 > weird-perms-607 + chmod 607 weird-perms-607 + + echo "## created fixtures ##" + ls_ "$fixtures" + false +} + +# no options +# + +# +L +# +L renamed +# +L with slash +# -rl --copy-unsafe-links +# single file +# file and directory + +# ssh: transport +# ssh -o batchmode=yes localhost true +# rsync: transport +# $ rsync --info=progress2 'rsync://archive.kernel.org/debian-archive/debian/dists/buzz/main/binary-i386/editors/e[de]*.deb' /tmp +# $ ls -lh /tmp/*.deb +# -rw-r----- 1 reidpr reidpr 98K Sep 14 14:17 /tmp/ed-0.2-11.deb +# -rw-r----- 1 reidpr reidpr 39K Sep 14 14:17 /tmp/ee-126.1.89-1.deb diff --git a/test/run/ch-convert.bats b/test/run/ch-convert.bats index 97f56c12e..9ac69b624 100644 --- a/test/run/ch-convert.bats +++ b/test/run/ch-convert.bats @@ -112,6 +112,7 @@ compare () { # # 4. Directory sizes also seem not to be stable. # +# See also “ls_” in 50_rsync.bats. compare-ls () { cd "$1" || exit # to make -path reasonable find . -mindepth 1 \ From 07c675088a7309e967cbe618617ae9ef04f02c8c Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 15 Sep 2023 15:22:34 -0600 Subject: [PATCH 03/20] parsing seems to work [skip ci] --- doc/ch-image.rst | 40 ++++++++++++++++---- lib/build.py | 95 +++++++++++++++++++++++++++++++++++++++++++++++- lib/image.py | 7 +++- 3 files changed, 131 insertions(+), 11 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 24093d81a..69e768f2c 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1023,7 +1023,7 @@ specify behavior and 25 years of history dealing with weirdness. gives copying behavior identical to :code:`rsync(1)`, including remote :code:`ssh:` and :code:`rsync:` transports. (In fact, Charliecloud’s current literally calls the host’s :code:`rsync(1)` to perform the copy, though this -may change in the future.) +may change in the future.) There is no list form of :code:`RSYNC`. :code:`RSYNC` takes the same arguments as :code:`rsync(1)`, so refer to its `man page `_ for a @@ -1048,11 +1048,35 @@ is translated to (the equivalent of):: $ rsync --foo ssh:src1 /context/src2 /storage/imgroot/foo/dst2 $ rsync /context/src3 /storage/imgroot/dst2 -Argument :code:`--daemon` is an error, because running :code:`rsync(1)` in -daemon mode does not make sense for container build. For arguments that read -input from a file (e.g. :code:`--exclude-from` or :code:`--files-from`), -relative paths are relative to the context directory, absolute paths refer to -the image root, and :code:`-` (standard input) is an error. +A small number of :code:`rsync(1)` features are actively disallowed: + + 1. The :code:`--daemon` flag is an error. Running :code:`rsync(1)` in daemon + mode does not make sense for container build. + + 2. :code:`rsync:` and :code:`ssh:` transports are an error. Charliecloud + needs access to the entire input to compute cache hit or miss, and these + transports make that impossible. It is possible these will become + available in the future (please let us know if that is your use case!). + For now, the workaround is to install :code:`rsync(1)` in the image and + use it in a :code:`RUN` instruction, though only the instruction text + will be considered for the cache. + + 3. Option arguments must be delimited with :code:`=` (equals). For example, + to set the block size to 4 MiB, you must say :code:`--block-size=4M` or + :code:`-B=4M`. :code:`-B4M` will be interpreted as the three arguments + :code:`-B`, :code:`-4`, and :code:`-M`; :code:`--block-size 4M` will be + interpreted as :code:`--block-size` with no argument and a copy source + named :code:`4M`. This is so Charliecloud can process :code:`rsync(1)` + options without knowing fully which ones take an argument. + +Note that there are other flags that don’t make sense and/or cause undesirable +behavior (for example, :code:`-n` / :code:`--dry-run`). We have not +characterized this problem. + +For arguments that read input from a file (e.g. :code:`--exclude-from` or +:code:`--files-from`), relative paths are relative to the context directory, +absolute paths refer to the image root, and :code:`-` (standard input) is an +error. :code:`RSYNC` allows specifying a single option beginning with :code:`+` (plus) that is shorthand for a group of :code:`rsync(1)` options. **These are @@ -1060,7 +1084,7 @@ preferred** to :code:`rsync(1)`’s own :code:`-a`/:code:`--archive` option because that may handle symlinks inappropriately for containers and omits some metadata. This single option is one of: - :code:`+` (bare plus) + :code:`+a` Equivalent to all of: * :code:`-@=-1`: use nanosecond precision when comparing timestamps. @@ -1074,7 +1098,7 @@ metadata. This single option is one of: meter (note `subtleties in interpretation `_). - :code:`+L` + :code:`+l` Equivalent to the options listed for :code:`+` and also: * :code:`-l`: process symlinks in the sources. diff --git a/lib/build.py b/lib/build.py index 3f9906927..dad08a430 100644 --- a/lib/build.py +++ b/lib/build.py @@ -391,7 +391,10 @@ def __str__(self): def announce_maybe(self): "Announce myself if I haven’t already been announced." if (not self.announced_p): - ch.INFO("%3s%s %s" % (self.lineno, self.status_char, self)) + self_ = str(self) + if (ch.user() == "qwofford" and sys.stderr.isatty()): + self_ = re.sub(r"^RSYNC", "NSYNC", self_) + ch.INFO("%3s%s %s" % (self.lineno, self.status_char, self_)) self.announced_p = True def chdir(self, path): @@ -660,7 +663,8 @@ class I_copy(Instruction): # # Note: The Dockerfile specification for COPY is complex, messy, # inexplicably different from cp(1), and incomplete. We try to be - # bug-compatible with Docker but probably are not 100%. See the FAQ. + # bug-compatible with Docker (legacy builder, not BuildKit -- yes, they are + # different) but probably are not 100%. See the FAQ. # # Because of these weird semantics, none of this abstracted into a general # copy function. I don’t want people calling it except from here. @@ -1147,6 +1151,93 @@ def execute(self): pass +class I_rsync(Instruction): + + __slots__ = ("dst", + "dst_raw", + "plus_option", + "rsync_options", + "srcs", + "srcs_raw") + + def __init__(self, *args): + super().__init__(*args) + st = self.tree.child("option_plus") + self.plus_option = None if st is None else st.terminal("OPTION_LETTER") + ch.VERBOSE(self.plus_option) + options_done = False + self.rsync_options = list() + self.srcs_raw = list() + for word in self.tree.terminals("WORDE"): + if (not options_done and word.startswith("-")): + # Option. See assumption in docs that makes parsing a lot easier. + if (word == "--"): # end of options + options_done = True + elif (word.startswith("--")): # long option + self.rsync_options.append(word) + else: # short option(s) + if (len(word) == 1): + ch.FATAL("RSYNC: invalid argument: %s" % word) + # Append options individually so we can process them more later. + for m in re.finditer(r"[^=]=.*$|[^=]", word[1:]): + self.rsync_options.append("-" + m[0]) + continue + # Not an option, so it must be a source or destination path. + self.srcs_raw.append(word) + if (len(self.srcs_raw) == 0): + ch.FATAL("RSYNC: source and destination missing") + elif (len(self.srcs_raw) == 1): + ch.FATAL("RSYNC: source or destination missing") + self.dst_raw = self.srcs_raw.pop() + + @property + def rsync_options_concise(self): + "Return self.rsync_options with short options coalesced." + # We don’t group short options with an argument even though we could + # because it seems confusing, e.g. “-ab=c” vs. “-a -b=c”. + def ship_out(): + nonlocal group + if (group != ""): + ret.append(group) + group = "" + ret = list() + group = "" + for o in self.rsync_options: + if (o.startswith("--")): # long option, not grouped + ship_out() + ret.append(o) + elif (len(o) > 2): # short option with argument, not grouped + ship_out() + ret.append(o) + else: # short option without argument, grouped + if (group == ""): + group = "-" + group += o[1:] # add to group + ship_out() + return ret + + @property + def sid_input(self): + return super().sid_input # FIXME - seems hard? + + @property + def str_(self): + ret = list() + if (self.plus_option is not None): + ret.append("+" + self.plus_option) + if (len(self.rsync_options_concise) > 0): + ret += self.rsync_options_concise + ret += self.srcs_raw + ret.append(self.dst_raw) + return " ".join(ret) + + #def prepare(self, miss_ct): + # ... + + def execute(self): + ... + + class Run(Instruction): __slots__ = ("cmd") diff --git a/lib/image.py b/lib/image.py index b6a00640d..262347f75 100644 --- a/lib/image.py +++ b/lib/image.py @@ -66,6 +66,7 @@ HEX_STRING: /[0-9A-Fa-f]+/ WORD: /[^ \t\n=]/+ +WORDE: /[^ \t\n]/+ IR_PATH_COMPONENT: /[a-z0-9_.-]+/ @@ -94,7 +95,7 @@ // First instruction must be ARG or FROM, but that is not a syntax error. dockerfile: _NEWLINES? ( arg_first | directive | comment )* ( instruction | comment )* -?instruction: _WS? ( arg | copy | env | from_ | label | run | shell | workdir | uns_forever | uns_yet ) +?instruction: _WS? ( arg | copy | env | from_ | label | rsync | run | shell | workdir | uns_forever | uns_yet ) directive.2: _WS? "#" _WS? DIRECTIVE_NAME "=" _line _NEWLINES DIRECTIVE_NAME: ( "escape" | "syntax" ) @@ -127,6 +128,8 @@ label_equalses: label_equals ( _WS label_equals )* label_equals: WORD "=" ( WORD | STRING_QUOTED ) +rsync: ( "RSYNC"i | "NSYNC"i ) ( _WS option_plus )? _WS WORDE ( _WS WORDE )+ _NEWLINES + run: "RUN"i _WS ( run_exec | run_shell ) _NEWLINES run_exec.2: _string_list run_shell: _line @@ -145,7 +148,9 @@ option: "--" OPTION_KEY "=" OPTION_VALUE option_keypair: "--" OPTION_KEY "=" OPTION_VAR "=" OPTION_VALUE +option_plus: "+" OPTION_LETTER OPTION_KEY: /[a-z]+/ +OPTION_LETTER: /[a-z]/ OPTION_VALUE: /[^= \t\n]+/ OPTION_VAR: /[a-z]+/ From 5a2259e28cb79e168e9a2c648f6ba31d48868352 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 15 Sep 2023 17:39:12 -0600 Subject: [PATCH 04/20] decent progress? [skip ci] --- lib/build.py | 45 +++++++++++++++++++++------------------------ lib/filesystem.py | 27 +++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 24 deletions(-) diff --git a/lib/build.py b/lib/build.py index dad08a430..7d8f8f17e 100644 --- a/lib/build.py +++ b/lib/build.py @@ -863,10 +863,6 @@ def execute(self): ch.FATAL("can’t COPY: unknown file type: %s" % src) def prepare(self, miss_ct): - def stat_bytes(path, links=False): - st = path.stat_(links) - return ( str(path).encode("UTF-8") - + struct.pack("=HQQ", st.st_mode, st.st_size, st.st_mtime_ns)) # Error checking. if (cli.context == "-" and self.from_ is None): ch.FATAL("no context because “-” given") @@ -918,16 +914,7 @@ def stat_bytes(path, links=False): # commonpath in pathlib ch.FATAL("can’t copy from outside context: %s" % src) # Gather metadata for hashing. - # FIXME: Locale issues related to sorting? - self.src_metadata = bytearray() - for src in self.srcs: - self.src_metadata += stat_bytes(src, links=True) - if (src.is_dir()): - for (dir_, dirs, files) in ch.walk(src): - self.src_metadata += stat_bytes(dir_) - for f in sorted(files): - self.src_metadata += stat_bytes(dir_ // f) - dirs.sort() + self.src_metadata = fs.Path.stat_bytes_all(self.srcs) # Pass on to superclass. return super().prepare(miss_ct) @@ -1157,14 +1144,15 @@ class I_rsync(Instruction): "dst_raw", "plus_option", "rsync_options", + "src_metadata", "srcs", "srcs_raw") def __init__(self, *args): super().__init__(*args) + line_no = self.tree.line st = self.tree.child("option_plus") self.plus_option = None if st is None else st.terminal("OPTION_LETTER") - ch.VERBOSE(self.plus_option) options_done = False self.rsync_options = list() self.srcs_raw = list() @@ -1177,7 +1165,7 @@ def __init__(self, *args): self.rsync_options.append(word) else: # short option(s) if (len(word) == 1): - ch.FATAL("RSYNC: invalid argument: %s" % word) + ch.FATAL("RSYNC: %d: invalid argument: %s" % (line_no, word)) # Append options individually so we can process them more later. for m in re.finditer(r"[^=]=.*$|[^=]", word[1:]): self.rsync_options.append("-" + m[0]) @@ -1185,9 +1173,9 @@ def __init__(self, *args): # Not an option, so it must be a source or destination path. self.srcs_raw.append(word) if (len(self.srcs_raw) == 0): - ch.FATAL("RSYNC: source and destination missing") + ch.FATAL("RSYNC: %d: source and destination missing" % line_no) elif (len(self.srcs_raw) == 1): - ch.FATAL("RSYNC: source or destination missing") + ch.FATAL("RSYNC: %d: source or destination missing" % line_no) self.dst_raw = self.srcs_raw.pop() @property @@ -1231,8 +1219,20 @@ def str_(self): ret.append(self.dst_raw) return " ".join(ret) - #def prepare(self, miss_ct): - # ... + def prepare(self, miss_ct): + ... + # Find the context directory. FIXME: This is pretty simple because we + # don’t yet support the equivalent of COPY --from. + if (cli.context == "-"): + ch.FATAL("no context because “-” given") + context = cli.context + # Expand sources. + # Expand destination. + # Expand --*-from options. + # Gather metadata for hashing. + self.src_metadata = fs.Path.stat_bytes_all(self.srcs) + # Pass on to superclass. + return super().prepare(miss_ct) def execute(self): ... @@ -1397,7 +1397,7 @@ class Environment: # image_i attributes? -## Supporting functions ### +## Supporting functions ## def unescape(sl): # FIXME: This is also ugly and should go in the grammar. @@ -1411,6 +1411,3 @@ def unescape(sl): sl = '"%s"' % sl assert (len(sl) >= 2 and sl[0] == '"' and sl[-1] == '"' and sl[-2:] != '\\"') return ast.literal_eval(sl) - - -# LocalWords: earley topdown iter lineno sid keypair dst srcs pathlib diff --git a/lib/filesystem.py b/lib/filesystem.py index 97487d276..897590f50 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -128,6 +128,14 @@ class method.""" else: ch.FATAL("can’t find path to gzip or pigz") + @staticmethod + def stat_bytes_all(paths): + "Return concatenation of metadata_bytes() on each given Path object." + md = bytearray() + for path in paths: + md += path.stat_bytes_recursive() + return md + def add_suffix(self, suff): """Returns the path object restulting from appending the specified suffix to the end of the path name. E.g. Path(foo).add_suffix(".txt") @@ -412,6 +420,25 @@ def stat_(self, links): return ch.ossafe(os.stat, "can’t stat: %s" % self, self, follow_symlinks=links) + def stat_bytes(self, links=False): + "Return self.stat_() encoded as an opaque bytearray." + st = self.stat_(links) + return ( str(self).encode("UTF-8") + + struct.pack("=HQQ", st.st_mode, st.st_size, st.st_mtime_ns)) + + def stat_bytes_recursive(self): + """Return concatenation of self.stat_() and all my children as an opaque + bytearray, in unspecified but consistent order.""" + # FIXME: Locale issues related to sorting? + md = self.stat_bytes(links=True) + if (self.is_dir()): + for (dir_, dirs, files) in ch.walk(src): + md += dir_.stat_bytes() + for f in sorted(files): + md += (dir_ // f).stat_bytes() + dirs.sort() + return md + def symlink(self, target, clobber=False): if (clobber and self.is_file()): self.unlink_() From fe56c5a06f3641dada8d68907a719bc31f8e4b25 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Mon, 18 Sep 2023 16:06:30 -0600 Subject: [PATCH 05/20] hey it seems to work [skip ci] --- doc/ch-image.rst | 25 +++-- lib/build.py | 226 ++++++++++++++++++++++++++------------------ lib/charliecloud.py | 3 +- lib/filesystem.py | 1 + 4 files changed, 155 insertions(+), 100 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 69e768f2c..53d517ca2 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1050,10 +1050,7 @@ is translated to (the equivalent of):: A small number of :code:`rsync(1)` features are actively disallowed: - 1. The :code:`--daemon` flag is an error. Running :code:`rsync(1)` in daemon - mode does not make sense for container build. - - 2. :code:`rsync:` and :code:`ssh:` transports are an error. Charliecloud + 1. :code:`rsync:` and :code:`ssh:` transports are an error. Charliecloud needs access to the entire input to compute cache hit or miss, and these transports make that impossible. It is possible these will become available in the future (please let us know if that is your use case!). @@ -1061,7 +1058,7 @@ A small number of :code:`rsync(1)` features are actively disallowed: use it in a :code:`RUN` instruction, though only the instruction text will be considered for the cache. - 3. Option arguments must be delimited with :code:`=` (equals). For example, + 2. Option arguments must be delimited with :code:`=` (equals). For example, to set the block size to 4 MiB, you must say :code:`--block-size=4M` or :code:`-B=4M`. :code:`-B4M` will be interpreted as the three arguments :code:`-B`, :code:`-4`, and :code:`-M`; :code:`--block-size 4M` will be @@ -1069,9 +1066,21 @@ A small number of :code:`rsync(1)` features are actively disallowed: named :code:`4M`. This is so Charliecloud can process :code:`rsync(1)` options without knowing fully which ones take an argument. -Note that there are other flags that don’t make sense and/or cause undesirable -behavior (for example, :code:`-n` / :code:`--dry-run`). We have not -characterized this problem. + 3. Invalid :code:`rsync(1)` options: + + :code:`--daemon` + Running :code:`rsync(1)` in daemon mode does not make sense for + container build. + + :code:`-n`, :code:`--dry-run` + This makes the copy a no-op, and Charliecloud may want to use it + internally in the future. + + :code:`--remove-source-files` + This would let the instruction alter the context directory. + +Note that there are likely other flags that don’t make sense and/or cause +undesirable behavior. We have not characterized this problem. For arguments that read input from a file (e.g. :code:`--exclude-from` or :code:`--files-from`), relative paths are relative to the context directory, diff --git a/lib/build.py b/lib/build.py index 7d8f8f17e..ab4eccdd3 100644 --- a/lib/build.py +++ b/lib/build.py @@ -2,13 +2,13 @@ import abc import ast +import enum import glob import json import os import os.path import re import shutil -import struct import sys import charliecloud as ch @@ -166,6 +166,14 @@ def build_arg_get(arg): global image_ct image_ct = sum(1 for i in tree.children_("from_")) + # If we use RSYNC, error out quickly if appropriate rsync(1) not present. + if (tree.child("rsync") is not None): + try: + ch.version_check(["rsync", "--version"], ch.RSYNC_MIN) + except ch.Fatal_Error: + ch.ERROR("Dockerfile uses RSYNC, so rsync(1) is required") + raise + # Traverse the tree and do what it says. # # We don’t actually care whether the tree is traversed breadth-first or @@ -552,6 +560,75 @@ def prepare(self, miss_ct): return miss_ct + int(self.miss) +class Copy(Instruction): + + # Superclass for instructions that do some flavor of file copying (ADD, + # COPY, RSYNC). + + __slots__ = ("dst", + "dst_raw", + "from_", + "src_metadata", + "srcs", + "srcs_base", + "srcs_raw") + + def expand_dest(self): + """Set self.dst from self.dst_raw with environment variables expanded + and image root prepended.""" + self.dst = self.image.unpack_path \ + // ch.variables_sub(self.dst_raw, self.env_build) + + def expand_sources(self): + """Set self.srcs from self.srcs_raw with environment variables and globs + expanded, absolute paths with appropriate base, and validate that + they are within the sources base.""" + if (cli.context == "-" and self.from_ is None): + ch.FATAL("no context because “-” given") + if (len(self.srcs_raw) < 1): + ch.FATAL("source or destination missing") + self.srcs_base_set() + self.srcs = list() + for src in (ch.variables_sub(i, self.env_build) for i in self.srcs_raw): + # glob can’t take Path + matches = [fs.Path(i) + for i in glob.glob("%s/%s" % (self.srcs_base, src))] + if (len(matches) == 0): + ch.FATAL("source not found: %s" % src) + for m in matches: + self.srcs.append(m) + ch.VERBOSE("source: %s" % m) + # Validate source is within context directory. (We need the source + # as given later, so don’t canonicalize persistently.) There is no + # clear subsitute for commonpath() in pathlib. + mc = m.resolve() + if (not os.path.commonpath([mc, self.srcs_base]) + .startswith(self.srcs_base)): + ch.FATAL("can’t copy from outside context: %s" % src) + + def srcs_base_set(self): + "Set self.srcs_base according to context and --from." + if (self.from_ is None): + self.srcs_base = cli.context + else: + if (self.from_ == self.image_i or self.from_ == self.image_alias): + ch.FATAL("--from: stage %s is the current stage" % self.from_) + if (not self.from_ in images): + # FIXME: Would be nice to also report if a named stage is below. + if (isinstance(self.from_, int) and self.from_ < image_ct): + if (self.from_ < 0): + ch.FATAL("--from: invalid negative stage index %d" + % self.from_) + else: + ch.FATAL("--from: stage %d does not exist yet" + % self.from_) + else: + ch.FATAL("--from: stage %s does not exist" % self.from_) + self.srcs_base = images[self.from_].unpack_path + self.srcs_base = os.path.realpath(self.srcs_base) + ch.VERBOSE("context: %s" % self.srcs_base) + + class Arg(Instruction): __slots__ = ("key", @@ -657,7 +734,7 @@ def value_default(self): return v -class I_copy(Instruction): +class I_copy(Copy): # ABANDON ALL HOPE YE WHO ENTER HERE # @@ -669,12 +746,7 @@ class I_copy(Instruction): # Because of these weird semantics, none of this abstracted into a general # copy function. I don’t want people calling it except from here. - __slots__ = ("dst", - "dst_raw", - "from_", - "src_metadata", - "srcs", - "srcs_raw") + __slots__ = () def __init__(self, *args): super().__init__(*args) @@ -863,56 +935,14 @@ def execute(self): ch.FATAL("can’t COPY: unknown file type: %s" % src) def prepare(self, miss_ct): - # Error checking. - if (cli.context == "-" and self.from_ is None): - ch.FATAL("no context because “-” given") - if (len(self.srcs_raw) < 1): - ch.FATAL("must specify at least one source") # Complain about unsupported stuff. if (self.options.pop("chown", False)): self.unsupported_forever_warn("--chown") # Any remaining options are invalid. self.options_assert_empty() - # Find the context directory. - if (self.from_ is None): - context = cli.context - else: - if (self.from_ == self.image_i or self.from_ == self.image_alias): - ch.FATAL("--from: stage %s is the current stage" % self.from_) - if (not self.from_ in images): - # FIXME: Would be nice to also report if a named stage is below. - if (isinstance(self.from_, int) and self.from_ < image_ct): - if (self.from_ < 0): - ch.FATAL("--from: invalid negative stage index %d" - % self.from_) - else: - ch.FATAL("--from: stage %d does not exist yet" - % self.from_) - else: - ch.FATAL("--from: stage %s does not exist" % self.from_) - context = images[self.from_].unpack_path - context_canon = os.path.realpath(context) - ch.VERBOSE("context: %s" % context) - # Expand sources. - self.srcs = list() - for src in (ch.variables_sub(i, self.env_build) for i in self.srcs_raw): - # glob can’t take Path - matches = [fs.Path(i) for i in glob.glob("%s/%s" % (context, src))] - if (len(matches) == 0): - ch.FATAL("source file not found: %s" % src) - for i in matches: - self.srcs.append(i) - ch.VERBOSE("source: %s" % i) - # Expand destination. - self.dst = ch.variables_sub(self.dst_raw, self.env_build) - # Validate sources are within context directory. (Can’t convert to - # canonical paths yet because we need the source path as given.) - for src in self.srcs: - src_canon = src.resolve() - if (not os.path.commonpath([src_canon, context_canon]) - .startswith(context_canon)): # no clear substitute for - # commonpath in pathlib - ch.FATAL("can’t copy from outside context: %s" % src) + # Expand operands. + self.expand_sources() + self.expand_dest() # Gather metadata for hashing. self.src_metadata = fs.Path.stat_bytes_all(self.srcs) # Pass on to superclass. @@ -1138,18 +1168,14 @@ def execute(self): pass -class I_rsync(Instruction): +class I_rsync(Copy): - __slots__ = ("dst", - "dst_raw", - "plus_option", - "rsync_options", - "src_metadata", - "srcs", - "srcs_raw") + __slots__ = ("plus_option", + "rsync_options") def __init__(self, *args): super().__init__(*args) + self.from_ = None # not supported yet line_no = self.tree.line st = self.tree.child("option_plus") self.plus_option = None if st is None else st.terminal("OPTION_LETTER") @@ -1174,8 +1200,6 @@ def __init__(self, *args): self.srcs_raw.append(word) if (len(self.srcs_raw) == 0): ch.FATAL("RSYNC: %d: source and destination missing" % line_no) - elif (len(self.srcs_raw) == 1): - ch.FATAL("RSYNC: %d: source or destination missing" % line_no) self.dst_raw = self.srcs_raw.pop() @property @@ -1219,23 +1243,60 @@ def str_(self): ret.append(self.dst_raw) return " ".join(ret) + def execute(self): + plus_options = list() + if (self.plus_option in ("a", "l")): + # see man page for explanations + plus_options = ["-@=-1", "-AHSXpr"] + if (sys.stderr.isatty()): + plus_options += ["--info=progress2"] + if (self.plus_option == "l"): + plus_options += ["-l", "--copy-unsafe-links"] + ch.cmd(["rsync"] + plus_options + self.rsync_options_concise + + self.srcs + [self.dst]) + + def expand_rsync_froms(self): + for i in range(len(self.rsync_options)): + o = self.rsync_options[i] + m = re.search("^--([a-z]+)-from=(.+)$", o) + if (m is not None): + key = m[1] + if (m[2] == "-"): + ch.FATAL("--*-from: can’t use standard input") + path = ch.Path(m[2]) + if (path.is_absolute()): + path = self.image.unpack_path // path + else: + path = self.srcs_base // path + self.rsync_options[i] = "--%s-from=%s" % (key, path) + def prepare(self, miss_ct): - ... - # Find the context directory. FIXME: This is pretty simple because we - # don’t yet support the equivalent of COPY --from. - if (cli.context == "-"): - ch.FATAL("no context because “-” given") - context = cli.context - # Expand sources. - # Expand destination. - # Expand --*-from options. + self.rsync_validate() + # Expand operands. + self.expand_sources() + self.expand_dest() + self.expand_rsync_froms() # Gather metadata for hashing. self.src_metadata = fs.Path.stat_bytes_all(self.srcs) # Pass on to superclass. return super().prepare(miss_ct) - def execute(self): - ... + def rsync_validate(self): + # Reject bad + options. + if (self.plus_option not in ("a", "l")): + ch.FATAL("invalid plus option: %s" % self.plus_option) + # Reject SSH and rsync transports. I *believe* simply the presence of + # “:” (colon) in the filename triggers this behavior. + for src in self.srcs_raw: + if (":" in src): + ch.FATAL("SSH and rsync transports not supported: %s" % src) + # Reject bad flags. + bad = { "--daemon", + "-n", "--dry-run", + "--remove-source-files" } + for o in self.rsync_options: + if (o in bad): + ch.FATAL("bad option: %s" % o) class Run(Instruction): @@ -1379,23 +1440,6 @@ class Environment: """The state we are in: environment variables, working directory, etc. Most of this is just passed through from the image metadata.""" - # FIXME: - # - problem: - # 1. COPY (at least) needs a valid build environment to figure out if it’s - # a hit or miss, which happens in prepare() - # 2. no files from the image are available in prepare(), so we can’t read - # image metadata then - # - could get it from Git if needed, but that seems complicated - # - valid during prepare() and execute() but not __init__() - # - in particular, don’t ch.variables_sub() in __init__() - # - instructions that update it need to change the env object in prepare() - # - WORKDIR SHELL ARG ENV - # - FROM - # - global images and image_i makes this harder because we need to read - # the metadata of image_i - 1 - # - solution: remove those two globals? instructions grow image and - # image_i attributes? - ## Supporting functions ## diff --git a/lib/charliecloud.py b/lib/charliecloud.py index 4b7626037..b8c6ba523 100644 --- a/lib/charliecloud.py +++ b/lib/charliecloud.py @@ -101,8 +101,9 @@ class Force_Mode(enum.Enum): # takes over. HTTP_CHUNK_SIZE = 256 * 1024 -# Minimum Python version. NOTE: Keep in sync with configure.ac. +# Minimum versions. NOTE: Keep in sync with configure.ac. PYTHON_MIN = (3,6) +RSYNC_MIN = (3,1,0) ## Globals ## diff --git a/lib/filesystem.py b/lib/filesystem.py index 897590f50..6c5a5391e 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -8,6 +8,7 @@ import pprint import shutil import stat +import struct import tarfile import charliecloud as ch From 31176f2b9b12ef9c6cb083778d44864e2b692818 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Mon, 18 Sep 2023 16:55:16 -0600 Subject: [PATCH 06/20] add configure test [skip ci] --- configure.ac | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/configure.ac b/configure.ac index 7fc8195f9..7a39dd87a 100644 --- a/configure.ac +++ b/configure.ac @@ -481,21 +481,27 @@ AS_IF([ test $enable_bundled_lark = yes \ && test $lark_status != bundled], [AC_MSG_ERROR([bundled Lark not found; see docs])]) -# Git -vmin_git=2.28.1 -AC_CHECK_PROG([GIT], [git], [git]) -CH_CHECK_VERSION([GIT], [$vmin_git], [--version | cut -d' ' -f3]) - # DOT vmin_dot=2.30.1 AC_CHECK_PROG([DOT], [dot], [dot]) CH_CHECK_VERSION([DOT], [$vmin_dot], [dot -V 2>&1 | cut -d' ' -f5]) +# Git +vmin_git=2.28.1 +AC_CHECK_PROG([GIT], [git], [git]) +CH_CHECK_VERSION([GIT], [$vmin_git], [--version | cut -d' ' -f3]) + # git2dot vmin_git2dot=0.8.3 AC_CHECK_PROG([GIT2DOT], [git2dot.py], [git2dot.py]) CH_CHECK_VERSION([GIT2DOT], [$vmin_git2dot], [--version | cut -d' ' -f3]) +# rsync +vmin_rsync=3.1.0 # NOTE: keep in sync with lib/charliecloud.py +AC_CHECK_PROG([RSYNC], [rsync], [rsync]) +CH_CHECK_VERSION([RSYNC], [$vmin_rsync], + [-V | sed -En 's/rsync +version (@<:@0-9@:>@+\.@<:@0-9@:>@+\.@<:@0-9@:>@+).*$/\1/p']) + ### Docs ##################################################################### @@ -727,6 +733,11 @@ AS_IF([ test $have_ch_image = yes \ [have_ch_image_bu=yes], [have_ch_image_bu=no]) +AS_IF([ test $have_ch_image = yes \ + && test -n "$RSYNC"], + [have_ch_image_rsync=yes], + [have_ch_image_rsync=no]) + AS_IF([ test -n "$DOCKER" \ && test -n "$MKTEMP"], [have_docker=yes], @@ -888,6 +899,10 @@ Building images ch-image(1): ... ${have_ch_image} Git ≥ $vmin_git ... ${GIT_VERSION_NOTE} + with ch-image(1) using RSYNC instruction: ${have_ch_image_rsync} + ch-image(1): ... ${have_ch_image} + rsync ≥ $vmin_rsync ... ${RSYNC_VERSION_NOTE} + with Docker: ${have_docker} Docker ≥ $vmin_docker ... ${DOCKER_VERSION_NOTE} mktemp(1) ... ${MKTEMP:-not found} From 22e0365dc3d4ef9e467ddaf06b13f4c62997f5ea Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Mon, 18 Sep 2023 17:42:52 -0600 Subject: [PATCH 07/20] twiddle docs [skip ci] --- doc/ch-image.rst | 73 ++++++++++++++++++++++++++++-------------------- doc/tutorial.rst | 8 ++++-- 2 files changed, 47 insertions(+), 34 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 53d517ca2..b0f304655 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1003,6 +1003,8 @@ Features we do not plan to support New instructions ---------------- +.. _ch-image_rsync: + :code:`RSYNC` ~~~~~~~~~~~~~ @@ -1039,14 +1041,50 @@ example, .. code-block:: docker WORKDIR /foo - RSYNC --foo ssh:src1 ./src2 ./dst1 - RSYNC ./src3 /dst2 + RSYNC --foo src1 src2 dst is translated to (the equivalent of):: $ mkdir -p /foo - $ rsync --foo ssh:src1 /context/src2 /storage/imgroot/foo/dst2 - $ rsync /context/src3 /storage/imgroot/dst2 + $ rsync -@=-1 -AHSXpr --info=progress2 -l --copy-unsafe-links \ + --foo /context/src1 /context/src2 /storage/imgroot/foo/dst2 + +Note the extensive default arguments to :code:`rsync(1)`. :code:`RSYNC` takes +a single instruction option beginning with :code:`+` (plus) that is shorthand +for a group of :code:`rsync(1)` options. This single option is one of: + + :code:`+m` + Preserves metadata. Equivalent to all of: + + * :code:`-@=-1`: use nanosecond precision when comparing timestamps. + * :code:`-A`: preserve ACLs. + * :code:`-H`: preserve hard link groups. + * :code:`-S`: preserve file sparseness when possible. + * :code:`-X`: preserve xattrs in :code:`user.*` namespace. + * :code:`-p`: preserve permissions. + * :code:`-r`: recurse into directories. + * :code:`--info=progress2` (only if stderr is a terminal): show progress + meter (note `subtleties in interpretation + `_). + + :code:`+l` (default) + Preserves metadata and symlinks (if within source directory). Equivalent + to the options listed for :code:`+m` and also: + + * :code:`-l`: Copy symlinks as symlinks if their target is within the + top-of-transfer directory. This is *not the context directory and often + not the source directory*; see below for examples. + + * :code:`--copy-unsafe-links`: copy the target of symlinks that point + outside the top-of-transfer directory for each source. + + :code:`+z` + + No default arguments. No metadata will be preserved, and both hard and + symbolic links will be ignored, except as specified by :code:`rsync(1)` + options starting with a hyphen. (Note that :code:`-a`/:code:`--archive` is + discouraged because it omits some metadata and handles symlinks + inappropriately for containers.) A small number of :code:`rsync(1)` features are actively disallowed: @@ -1087,34 +1125,7 @@ For arguments that read input from a file (e.g. :code:`--exclude-from` or absolute paths refer to the image root, and :code:`-` (standard input) is an error. -:code:`RSYNC` allows specifying a single option beginning with :code:`+` -(plus) that is shorthand for a group of :code:`rsync(1)` options. **These are -preferred** to :code:`rsync(1)`’s own :code:`-a`/:code:`--archive` option -because that may handle symlinks inappropriately for containers and omits some -metadata. This single option is one of: - - :code:`+a` - Equivalent to all of: - * :code:`-@=-1`: use nanosecond precision when comparing timestamps. - * :code:`-A`: preserve ACLs. - * :code:`-H`: preserve hard link groups. - * :code:`-S`: preserve file sparseness when possible. - * :code:`-X`: preserve xattrs in :code:`user.*` namespace. - * :code:`-p`: preserve permissions. - * :code:`-r`: recurse into directories. - * :code:`--info=progress2` (only if stderr is a terminal): show progress - meter (note `subtleties in interpretation - `_). - - :code:`+l` - Equivalent to the options listed for :code:`+` and also: - - * :code:`-l`: process symlinks in the sources. - - * :code:`--copy-unsafe-links`: copy the target of symlinks that point - outside the top-of-transfer directory for each source. This is *not the - context directory*; see below for examples. Symlink handling is particularly fraught, so use care. In particular, you must state explicitly what to do with symlinks in the source: both plain diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 79aa2a3a4..60d77d162 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -265,7 +265,7 @@ recipe: FROM almalinux:8 RUN yum -y install python36 - COPY ./hello.py / + RSYNC ./hello.py / RUN chmod 755 /hello.py These four instructions say: @@ -275,9 +275,11 @@ These four instructions say: 2. :code:`RUN`: Install the :code:`python36` RPM package, which we need for our Hello World program. - 3. :code:`COPY`: Copy the file :code:`hello.py` we just made to the root + 3. :code:`RSYNC`: Copy the file :code:`hello.py` we just made to the root directory of the image. In the source argument, the path is relative to - the *context directory*, which we’ll see more of below. + the *context directory*, which we’ll see more of below. (This instruction + is a Charliecloud extension; see :ref:`its documentation ` + for details.) You can also use standard :code:`COPY` if you prefer.) 4. :code:`RUN`: Make that file executable. From 93d000071a7ad9d699cf23c12c05c54b2443c588 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Tue, 19 Sep 2023 17:42:51 -0600 Subject: [PATCH 08/20] more good progress [skip ci] --- doc/ch-image.rst | 11 ++-- lib/build.py | 13 ++-- lib/filesystem.py | 134 ++++++++++++++++++++++++--------------- test/build/50_rsync.bats | 76 ++++++++++++++++++++++ test/common.bash | 3 + 5 files changed, 176 insertions(+), 61 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index b0f304655..2bd269507 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1079,12 +1079,11 @@ for a group of :code:`rsync(1)` options. This single option is one of: outside the top-of-transfer directory for each source. :code:`+z` - - No default arguments. No metadata will be preserved, and both hard and - symbolic links will be ignored, except as specified by :code:`rsync(1)` - options starting with a hyphen. (Note that :code:`-a`/:code:`--archive` is - discouraged because it omits some metadata and handles symlinks - inappropriately for containers.) + No default arguments. Directories will not be descended, no metadata will + be preserved, and both hard and symbolic links will be ignored, except as + specified by :code:`rsync(1)` options starting with a hyphen. (Note that + :code:`-a`/:code:`--archive` is discouraged because it omits some metadata + and handles symlinks inappropriately for containers.) A small number of :code:`rsync(1)` features are actively disallowed: diff --git a/lib/build.py b/lib/build.py index ab4eccdd3..55e29f67c 100644 --- a/lib/build.py +++ b/lib/build.py @@ -565,11 +565,11 @@ class Copy(Instruction): # Superclass for instructions that do some flavor of file copying (ADD, # COPY, RSYNC). - __slots__ = ("dst", + __slots__ = ("dst", # string b/c trailing slash is significant "dst_raw", "from_", "src_metadata", - "srcs", + "srcs", # strings b/c trailing slashes are significant "srcs_base", "srcs_raw") @@ -590,6 +590,7 @@ def expand_sources(self): self.srcs_base_set() self.srcs = list() for src in (ch.variables_sub(i, self.env_build) for i in self.srcs_raw): + ts_p = src[1] == "/" # glob can’t take Path matches = [fs.Path(i) for i in glob.glob("%s/%s" % (self.srcs_base, src))] @@ -602,6 +603,8 @@ def expand_sources(self): # as given later, so don’t canonicalize persistently.) There is no # clear subsitute for commonpath() in pathlib. mc = m.resolve() + m.trailing_slash_p + mc.trailing_slash_p if (not os.path.commonpath([mc, self.srcs_base]) .startswith(self.srcs_base)): ch.FATAL("can’t copy from outside context: %s" % src) @@ -1178,7 +1181,7 @@ def __init__(self, *args): self.from_ = None # not supported yet line_no = self.tree.line st = self.tree.child("option_plus") - self.plus_option = None if st is None else st.terminal("OPTION_LETTER") + self.plus_option = "l" if st is None else st.terminal("OPTION_LETTER") options_done = False self.rsync_options = list() self.srcs_raw = list() @@ -1245,7 +1248,7 @@ def str_(self): def execute(self): plus_options = list() - if (self.plus_option in ("a", "l")): + if (self.plus_option in ("m", "l")): # no action needed for +z # see man page for explanations plus_options = ["-@=-1", "-AHSXpr"] if (sys.stderr.isatty()): @@ -1283,7 +1286,7 @@ def prepare(self, miss_ct): def rsync_validate(self): # Reject bad + options. - if (self.plus_option not in ("a", "l")): + if (self.plus_option not in ("m", "l", "z")): ch.FATAL("invalid plus option: %s" % self.plus_option) # Reject SSH and rsync transports. I *believe* simply the presence of # “:” (colon) in the filename triggers this behavior. diff --git a/lib/filesystem.py b/lib/filesystem.py index 6c5a5391e..72997876d 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -36,46 +36,62 @@ ## Classes ## class Path(pathlib.PosixPath): - """Stock Path objects have the very weird property that appending an - *absolute* path to an existing path ignores the left operand, leaving - only the absolute right operand: - - >>> import pathlib - >>> a = pathlib.Path("/foo/bar") - >>> a.joinpath("baz") - PosixPath('/foo/bar/baz') - >>> a.joinpath("/baz") - PosixPath('/baz') - - This is contrary to long-standing UNIX/POSIX, where extra slashes in a - path are ignored, e.g. the path "foo//bar" is equivalent to "foo/bar". - It seems to be inherited from os.path.join(). - - Even with the relatively limited use of Path objects so far, this has - caused quite a few bugs. IMO it’s too difficult and error-prone to - manually manage whether paths are absolute or relative. Thus, this - subclass introduces a new operator "//" which does the right thing, - i.e., if the right operand is absolute, that fact is ignored. E.g.: - - >>> a = Path("/foo/bar") - >>> a.joinpath_posix("baz") - Path('/foo/bar/baz') - >>> a.joinpath_posix("/baz") - Path('/foo/bar/baz') - >>> a // "/baz" - Path('/foo/bar/baz') - >>> "/baz" // a - Path('/baz/foo/bar') - - We introduce a new operator because it seemed like too subtle a change - to the existing operator "/" (which we disable to avoid getting burned - here in Charliecloud). An alternative was "+" like strings, but that led - to silently wrong results when the paths *were* strings (components - concatenated with no slash).""" + """Filesystem paths with two important differences from the stock Path: + + 1. This Path remembers whether a trailing slash is present, and appends + it when str() or repr(). Note that Path("/") is considered *not* to + have a trailing slash. + + 2. Stock Path objects have the very weird property that appending an + absolute path to an existing path ignores the left operand, leaving + only the absolute right operand: + + >>> import pathlib + >>> a = pathlib.Path("/foo/bar") + >>> a.joinpath("baz") + PosixPath('/foo/bar/baz') + >>> a.joinpath("/baz") + PosixPath('/baz') + + This is contrary to long-standing UNIX/POSIX, where extra slashes in a + path are ignored, e.g. the path "foo//bar" is equivalent to "foo/bar". + It seems to be inherited from os.path.join(). + + Even with the relatively limited use of Path objects so far, this has + caused quite a few bugs. IMO it’s too difficult and error-prone to + manually manage whether paths are absolute or relative. Thus, this + subclass introduces a new operator "//" which does the right thing, + i.e., if the right operand is absolute, that fact is ignored. E.g.: + + >>> a = Path("/foo/bar") + >>> a.joinpath_posix("baz") + Path('/foo/bar/baz') + >>> a.joinpath_posix("/baz") + Path('/foo/bar/baz') + >>> a // "/baz" + Path('/foo/bar/baz') + >>> "/baz" // a + Path('/baz/foo/bar') + + We introduce a new operator because it seemed like too subtle a change + to the existing operator "/" (which we disable to avoid getting burned + here in Charliecloud). An alternative was "+" like strings, but that + led to silently wrong results when the paths *were* strings + (components concatenated with no slash).""" + + __slots__ = ("trailing_slash_p",) # Name of the gzip(1) to use; set on first call of file_gzip(). gzip = None + def __init__(self, *args): + # WARNING: Something very weird is going on with the superclass [1] that + # means super().__init__ refers to object.__init__, not + # PosixPath.__init__. All seems well if we don’t call it but omg. + # [1]: https://stackoverflow.com/questions/61689391 + self.trailing_slash_p = ( len(args) > 0 and isinstance(args[-1], str) + and len(args[-1]) > 1 and args[-1][-1] == "/") + def __floordiv__(self, right): return self.joinpath_posix(right) @@ -89,6 +105,9 @@ def __rfloordiv__(self, left): def __rtruediv__(self, left): return NotImplemented + def __str__(self): + return super().__str__() + ("/" if self.trailing_slash_p else "") + def __truediv__(self, right): return NotImplemented @@ -307,22 +326,32 @@ def is_relative_to(self, *other): except ValueError: return False - def joinpath_posix(self, other): + def joinpath_posix(self, right): # This method is a hot spot, so the hairiness is due to optimizations. # It runs about 30% faster than the naïve verson below. - if (isinstance(other, Path)): - other_parts = other._parts - if (len(other_parts) > 0 and other_parts[0] == "/"): - other_parts = other_parts[1:] - elif (isinstance(other, str)): - other_parts = other.split("/") - if (len(other_parts) > 0 and len(other_parts[0]) == 0): - other_parts = other_parts[1:] + if (isinstance(right, Path)): + right_parts = right._parts + ts_p = right.trailing_slash_p + if (len(right_parts) > 0 and right_parts[0] == "/"): + right_parts = right_parts[1:] + elif (isinstance(right, str)): + right_parts = right.split("/") + if (len(right_parts) > 0 and len(right_parts[0]) == 0): + # absolute path, rm empty first component + right_parts = right_parts[1:] + if (len(right_parts) > 0 and len(right_parts[-1]) == 0): + # trailing slash, rm empty last component + right_parts = right_parts[:-1] + ts_p = True + else: + ts_p = False else: - ch.INFO(type(other)) - assert False, "unknown type" - return self._from_parsed_parts(self._drv, self._root, - self._parts + other_parts) + assert False, "unknown type: %s" % type(right) + ret = self._from_parsed_parts(self._drv, self._root, + self._parts + right_parts) + ret.trailing_slash_p = ts_p + return ret + # Naïve implementation for reference. #other = Path(other) #if (other.is_absolute()): @@ -433,7 +462,7 @@ def stat_bytes_recursive(self): # FIXME: Locale issues related to sorting? md = self.stat_bytes(links=True) if (self.is_dir()): - for (dir_, dirs, files) in ch.walk(src): + for (dir_, dirs, files) in ch.walk(self): md += dir_.stat_bytes() for f in sorted(files): md += (dir_ // f).stat_bytes() @@ -462,6 +491,11 @@ def unlink_(self, missing_ok=False): return ch.ossafe(super().unlink, "can’t unlink: %s" % self.name) + def resolve(self, *args, **kwargs): + ret = super().resolve(*args, **kwargs) + ret.trailing_slash_p = self.trailing_slash_p + return ret + class Storage: diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index a978a7c9c..5e64c7ca9 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -9,6 +9,7 @@ setup () { fixtures=$BATS_TMPDIR/rsync context=$fixtures/ctx context_out=$fixtures/ctx-out + context_doc=$fixtures/doc if [[ -e $fixtures ]]; then echo '## input ##' ls_ "$fixtures" @@ -41,6 +42,7 @@ ls_ () { # top level mkdir "$context" mkdir "$context_out" + mkdir "$context_doc" ln -s "$context" "$fixtures"/ctx_direct # outside context @@ -92,8 +94,81 @@ ls_ () { touch weird-perms-607 > weird-perms-607 chmod 607 weird-perms-607 + # simpler example for the docs + cd "$context_doc" + mkdir src1 + echo file-src1 > src1/file-src1 + chmod 607 src1/file-src1 + mkdir src2 + echo file-src2 > src2/file-src2 + cd src1 + ln -s file-src1 link_file-src1 + ln -s ../src2/file-src2 link_file-src2 + echo "## created fixtures ##" ls_ "$fixtures" +} + +@test "${tag}: doc examples" { + # This generates copy-paste source for ch-image(1) man page. We still + # validate it, though. + + cat < "$ch_tmpimg_df" +FROM alpine:3.17 + +# copy single file +RSYNC /src1/file-src1 / +# ... renamed +RSYNC /src1/file-src1 /file-src1_renamed +# ... without metadata +RSYNC +z /src1/file-src1 /file-src1_nom +# ... with trailing slash on *destination* +RSYNC /src1/file-src1 /file-src1_slash/ + +# copy directory +RSYNC /src1 / +# ... renamed? +RSYNC /src1 /dst2 +# ... renamed +RSYNC /src1/ /dst3 +# ... destination trailing slash has no effect for directory sources +RSYNC /src1 /dst2b/ +RSYNC /src1/ /dst3b/ + +# copy two directories separately +RUN mkdir /dst4 && echo file-dst4 > /dst4/file-dst4 +RSYNC /src1 /src2 /dst4 +# ... with wildcards +RUN mkdir /dst4b && echo file-dst4b > /dst4/file-dst4b +RSYNC /src* /dst4b + +# ... with trailing slashes +RUN mkdir /dst5 && echo file-dst5 > /dst5/file-dst5 +# ... with trailing slashes and wildcards +# ... with one trailing slash and one not + +EOF + + ch-image build --rebuild -v -f "$ch_tmpimg_df" "$context_doc" + + # +z permissions don't come across + + # copy both src1 and src2 + + # merge directories + + # top of transfer with just a file + + # symlink stuff? + # symlink between src1 and src2 + + # FIXME YOU ARE HERE -- doc examples become comprehensive? or maybe split into symlinks and not symlinks? + + cd "$CH_IMAGE_STORAGE/img/tmpimg" + ls -lh file-src1* + ls -lhR src1 + ls -lhR dst* + false } @@ -104,6 +179,7 @@ ls_ () { # +L with slash # -rl --copy-unsafe-links # single file +# single file with trailing slash on *destination* # file and directory # ssh: transport diff --git a/test/common.bash b/test/common.bash index d8be01f02..f4c4f0744 100644 --- a/test/common.bash +++ b/test/common.bash @@ -19,6 +19,9 @@ if false; then status= fi +# Some defaults +ch_tmpimg_df="$BATS_TMPDIR"/tmpimg.df + arch_exclude () { # Skip the test if architecture (from “uname -m”) matches $1. [[ $(uname -m) != "$1" ]] || skip "arch ${1}" From 714035fac71e3b10669bbbaac977a338b985c6bd Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Thu, 21 Sep 2023 16:27:14 -0600 Subject: [PATCH 09/20] some tests [skip ci] --- lib/filesystem.py | 16 ++- test/build/50_rsync.bats | 287 +++++++++++++++++++++++---------------- 2 files changed, 178 insertions(+), 125 deletions(-) diff --git a/lib/filesystem.py b/lib/filesystem.py index 72997876d..8a0bfc6db 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -131,6 +131,12 @@ def git_incompatible_p(self): "Return True if I can’t be stored in Git because of my name." return self.name.startswith(".git") + @property + def parent(self): + ret = super().parent + ret.trailing_slash_p = False # trailing slash in removed part + return ret + @classmethod def gzip_set(cls): """Set gzip class attribute on first call to file_gzip(). @@ -423,6 +429,11 @@ def rename_(self, name_new): "can’t rename: %s -> %s" % (self.name, name_new), name_new) + def resolve(self, *args, **kwargs): + ret = super().resolve(*args, **kwargs) + ret.trailing_slash_p = self.trailing_slash_p + return ret + def rmdir_(self): ch.ossafe(super().rmdir, "can’t rmdir: %s" % self.name) @@ -491,11 +502,6 @@ def unlink_(self, missing_ok=False): return ch.ossafe(super().unlink, "can’t unlink: %s" % self.name) - def resolve(self, *args, **kwargs): - ret = super().resolve(*args, **kwargs) - ret.trailing_slash_p = self.trailing_slash_p - return ret - class Storage: diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index 5e64c7ca9..d06f3112b 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -1,5 +1,7 @@ load ../common +# NOTE: ls(1) output is not checked; this is for copy-paste into docs + # shellcheck disable=SC2034 tag=RSYNC @@ -21,11 +23,9 @@ ls_ () { # also “compare-ls” in ch-convert.bats. ( cd "$1" - find . -mindepth 1 -printf '%M %n %3s%y :%d:%f -> %l\n' \ - | sed -E -e 's|:1:||' \ - -e 's|:2:| |' \ - -e 's|:3:| |' \ - -e 's|:4:| |' \ + find . -mindepth 1 -printf '%M %n %3s%y %P -> %l\n' \ + | LC_ALL=C sort -k4 \ + | sed -E -e 's|\w+/| |g' \ -e 's| -> $||' \ -e 's|([0-9]+)[f]|\1|' \ -e 's|([0-9]+ )[0-9 ]+[a-z] |\1 |' \ @@ -41,137 +41,191 @@ ls_ () { # top level mkdir "$context" - mkdir "$context_out" - mkdir "$context_doc" - ln -s "$context" "$fixtures"/ctx_direct - - # outside context - cd "$context_out" - echo file-out > file-out - mkdir dir-out - echo dir-out.file-out > dir-out/dir-out.file-out - ln -s file-out link.out2out - cd .. - - # inside context + echo file-top > file-top + + # basic example cd "$context" - echo file-ctx > file-ctx - mkdir src - echo src.file > src/src.file - mkdir src/src.dir - echo src.dir.file > src/src.dir/src.dir.file - mkdir not-src - echo not-src.file > not-src.file - - # symlinks to inside source - cd src - # to file - ln -s src.file src.file_direct - ln -s ../src/src.file src.file_up_over - ln -s "$context"/src/src.file src.file_abs - # relative to directory - ln -s src.dir src.dir_direct - ln -s ../src/src.dir src.dir_upover - ln -s "$context"/src/src.dir src.dir_abs - - # symlinks to outside source but inside context - ln -s ../file-ctx file-ctx_up - ln -s "$context"/file-ctx file-ctx_abs - ln -s ../not-src not-src_up - ln -s "$context"/not-src not-src_abs - - # symlinks to outside context - ln -s ../../ctx-out/file-out file-out_rel - ln -s "$context_out"/file-out file-out_abs - ln -s ../../ctx-out/dir-out dir-out_rel - ln -s "$context_out"/dir-out dir-out_abs - - # hard links - echo hard1 > hard1 - ln hard1 hard2 - - # weird permissions - touch weird-perms-607 > weird-perms-607 - chmod 607 weird-perms-607 - - # simpler example for the docs - cd "$context_doc" - mkdir src1 - echo file-src1 > src1/file-src1 - chmod 607 src1/file-src1 - mkdir src2 - echo file-src2 > src2/file-src2 - cd src1 - ln -s file-src1 link_file-src1 - ln -s ../src2/file-src2 link_file-src2 + mkdir basic1 + echo file-basic1 > basic1/file-basic1 + chmod 607 basic1/file-basic1 # weird permissions + mkdir basic2 + echo file-basic2 > basic2/file-basic2 + + # # outside context + # cd "$context_out" + # echo file-out > file-out + # mkdir dir-out + # echo dir-out.file-out > dir-out/dir-out.file-out + # ln -s file-out link.out2out + # cd .. + + # # inside context + # cd "$context" + # echo file-ctx > file-ctx + # mkdir src + # echo src.file > src/src.file + # mkdir src/src.dir + # echo src.dir.file > src/src.dir/src.dir.file + # mkdir not-src + # echo not-src.file > not-src.file + + # # symlinks to inside source + # cd src + # # to file + # ln -s src.file src.file_direct + # ln -s ../src/src.file src.file_up_over + # ln -s "$context"/src/src.file src.file_abs + # # relative to directory + # ln -s src.dir src.dir_direct + # ln -s ../src/src.dir src.dir_upover + # ln -s "$context"/src/src.dir src.dir_abs + + # # symlinks to outside source but inside context + # ln -s ../file-ctx file-ctx_up + # ln -s "$context"/file-ctx file-ctx_abs + # ln -s ../not-src not-src_up + # ln -s "$context"/not-src not-src_abs + + # # symlinks to outside context + # ln -s ../../ctx-out/file-out file-out_rel + # ln -s "$context_out"/file-out file-out_abs + # ln -s ../../ctx-out/dir-out dir-out_rel + # ln -s "$context_out"/dir-out dir-out_abs echo "## created fixtures ##" ls_ "$fixtures" } -@test "${tag}: doc examples" { - # This generates copy-paste source for ch-image(1) man page. We still - # validate it, though. - +@test "${tag}: basic examples" { + # files cat < "$ch_tmpimg_df" FROM alpine:3.17 +RUN mkdir /dst -# copy single file -RSYNC /src1/file-src1 / +# single file +RSYNC /basic1/file-basic1 /dst # ... renamed -RSYNC /src1/file-src1 /file-src1_renamed +RSYNC /basic1/file-basic1 /dst/file-basic1_renamed # ... without metadata -RSYNC +z /src1/file-src1 /file-src1_nom +RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom # ... with trailing slash on *destination* -RSYNC /src1/file-src1 /file-src1_slash/ +RSYNC /basic1/file-basic1 /dst/new/ +EOF + ch-image build -f "$ch_tmpimg_df" "$context" + ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) + run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + echo "$output" + [[ $status -eq -0 ]] +cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst -# copy directory -RSYNC /src1 / +# directory +RSYNC /basic1 /dst # ... renamed? -RSYNC /src1 /dst2 -# ... renamed -RSYNC /src1/ /dst3 +RSYNC /basic1 /dst/basic1_new +# ... renamed (trailing slash on source) +RSYNC /basic1/ /dst/basic1_renamed # ... destination trailing slash has no effect for directory sources -RSYNC /src1 /dst2b/ -RSYNC /src1/ /dst3b/ +RSYNC /basic1 /dst/basic1_newB +RSYNC /basic1/ /dst/basic1_renamedB/ +# ... with +z (no-op!!) +RSYNC +z /basic1 /dst/basic1_newC +# ... need -r at least +RSYNC +z -r /basic1/ /dst/basic1_newD +EOF + ch-image build -f "$ch_tmpimg_df" "$context" + ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) + run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + echo "$output" + [[ $status -eq -0 ]] +cat < /dst4/file-dst4 -RSYNC /src1 /src2 /dst4 -# ... with wildcards -RUN mkdir /dst4b && echo file-dst4b > /dst4/file-dst4b -RSYNC /src* /dst4b + # multiple directories + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +# two directories explicitly +RUN mkdir /dst/dstB && echo file-dstB > /dst/dstB/file-dstB +RSYNC /basic1 /basic2 /dst/dstB +# ... with wildcards +RUN mkdir /dst/dstC && echo file-dstC > /dst/dstC/file-dstC +RSYNC /basic* /dst/dstC # ... with trailing slashes -RUN mkdir /dst5 && echo file-dst5 > /dst5/file-dst5 +RUN mkdir /dst/dstD && echo file-dstD > /dst/dstD/file-dstD +RSYNC /basic1/ /basic2/ /dst/dstD # ... with trailing slashes and wildcards +RUN mkdir /dst/dstE && echo file-dstE > /dst/dstE/file-dstE +RSYNC /basic*/ /dst/dstE # ... with one trailing slash and one not - +RUN mkdir /dst/dstF && echo file-dstF > /dst/dstF/file-dstF +RSYNC /basic1 /basic2/ /dst/dstF +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) + run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + echo "$output" + [[ $status -eq -0 ]] +cat < Date: Mon, 2 Oct 2023 17:30:19 -0600 Subject: [PATCH 10/20] twiddle docs [skip ci] --- doc/ch-image.rst | 56 ++++++++++++++++++++++++++---------------------- doc/conf.py | 15 +++++++++++++ 2 files changed, 45 insertions(+), 26 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 2bd269507..ab0865c7f 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1022,26 +1022,25 @@ extremely capable, widely used file copy tool, with detailed options to specify behavior and 25 years of history dealing with weirdness. :code:`RSYNC` (also spelled :code:`NSYNC`) is a Charliecloud extension that -gives copying behavior identical to :code:`rsync(1)`, including remote -:code:`ssh:` and :code:`rsync:` transports. (In fact, Charliecloud’s current -literally calls the host’s :code:`rsync(1)` to perform the copy, though this -may change in the future.) There is no list form of :code:`RSYNC`. +gives copying behavior identical to :code:`rsync(1)`. In fact, Charliecloud’s +current implementation literally calls the host’s :code:`rsync(1)` to do the +copy, though this may change in the future. There is no list form of +:code:`RSYNC`. :code:`RSYNC` takes the same arguments as :code:`rsync(1)`, so refer to its `man page `_ for a -detailed explanation of all the options. Source arguments that are remote -(i.e., starting with :code:`ssh:` or :code:`rsync:`) are unchanged, while -local sources must be relative paths and are relative to the context -directory. Any globbed sources are processed by :code:`ch-image(1)` using -Python rules, i.e., :code:`rsync(1)` sees no wildcard sources. Destinations -must be local. Relative destinations are relative to the image’s current -working directory, while absolute destinations refer to the image’s root. For -example, +detailed explanation of all the options. Sources are relative to the context +directory even if they look absolute with a leading slash. Any globbed sources +are processed by :code:`ch-image(1)` using Python rules, i.e., +:code:`rsync(1)` sees the expanded sources with no wildcards. Relative +destinations are relative to the image’s current working directory, while +absolute destinations refer to the image’s root. For example, .. code-block:: docker WORKDIR /foo RSYNC --foo src1 src2 dst + COPY --foo src1 src2 dst is translated to (the equivalent of):: @@ -1101,7 +1100,7 @@ A small number of :code:`rsync(1)` features are actively disallowed: :code:`-B`, :code:`-4`, and :code:`-M`; :code:`--block-size 4M` will be interpreted as :code:`--block-size` with no argument and a copy source named :code:`4M`. This is so Charliecloud can process :code:`rsync(1)` - options without knowing fully which ones take an argument. + options without knowing which ones take an argument. 3. Invalid :code:`rsync(1)` options: @@ -1124,20 +1123,25 @@ For arguments that read input from a file (e.g. :code:`--exclude-from` or absolute paths refer to the image root, and :code:`-` (standard input) is an error. - - -Symlink handling is particularly fraught, so use care. In particular, you must -state explicitly what to do with symlinks in the source: both plain -:code:`RSYNC` and :code:`RSYNC +` ignore them with a warning. Your options are -:code:`+L` (which is a pretty reasonable default) and any valid combination of +Symlink handling is particularly fraught, so use care. The default handling +(see above) seemed reasonable to us, but you may want something different. See :code:`rsync(1)`’s `normal symlink options `_. -(:code:`COPY` isn’t any less fraught, and you have no choice about what to do -with symlinks.) - -The instruction is a cache hit if :code:`rsync(1)` would not change anything, -i.e. if :code:`-ni` prepended to the given options lists no changes, and a -miss otherwise. +Importantly, :code:`COPY` is not any less fraught, and you have no choice +about what to do with symlinks. + +The instruction is a cache hit if the metadata of all source files is +unchanged (specifically: filename, file type and permissions, xattrs, size, +and last modified time). Unlike Docker, Charliecloud does not use file +contents. This has two implications. First, it is possible to fool the cache +by manually restoring the last-modified time. Second, :code:`RSYNC` is +I/O-intensive even when it hits, because it must :code:`stat(2)` every source +file before checking the cache. However, this is still less I/O than reading +the file content too. + +Notably, Charliecloud’s cache ignores :code:`rsync(1)`’s own internal notion +of whether anything would be transferred (e.g., :code:`rsync -ni`). This may +change in the future. .. note:: @@ -1610,4 +1614,4 @@ Environment variables .. LocalWords: tmpfs'es bigvendor AUTH auth bucache buc bigfile df rfc bae .. LocalWords: dlcache graphviz packfile packfiles bigFileThreshold fd Tpdf .. LocalWords: pstats gprof chofile cffd cacdb ARGs NSYNC dst imgroot popt -.. LocalWords: globbed ni +.. LocalWords: globbed ni AHSXpr diff --git a/doc/conf.py b/doc/conf.py index 6402c4f1b..4434bf919 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -36,6 +36,21 @@ except ImportError: pass +# Monkey-patch RSYNC keyword into Pygments. Ignore any problems. print() +# statements here show up in the make(1) chatter. +# +# See: https://github.com/pygments/pygments/blob/e2cb7c9/pygments/lexers/configs.py#L667 +try: + import pygments.lexers.configs as plc + import re + for (i, tok) in enumerate(plc.DockerLexer.tokens["root"]): + m = re.search(r"^(.*COPY.*)\)\)$", tok[0]) + if (m is not None): + re_new = m[1] + "|RSYNC))" + plc.DockerLexer.tokens["root"][i] = (re_new, tok[1]) +except Exception: + pass + # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] From 5d9f3946d3a7d8303a33697a64bb95acc4fa72b7 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Mon, 9 Oct 2023 16:44:31 -0600 Subject: [PATCH 11/20] one bazillion tests [skip ci] --- doc/ch-image.rst | 47 +++- lib/build.py | 6 +- test/build/50_rsync.bats | 512 ++++++++++++++++++++++++++++++++------- 3 files changed, 469 insertions(+), 96 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index ab0865c7f..354a5fcba 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1040,7 +1040,6 @@ absolute destinations refer to the image’s root. For example, WORKDIR /foo RSYNC --foo src1 src2 dst - COPY --foo src1 src2 dst is translated to (the equivalent of):: @@ -1053,7 +1052,8 @@ a single instruction option beginning with :code:`+` (plus) that is shorthand for a group of :code:`rsync(1)` options. This single option is one of: :code:`+m` - Preserves metadata. Equivalent to all of: + Preserves metadata and directory structure. Symlinks are skipped *with a + warning*. Equivalent to all of: * :code:`-@=-1`: use nanosecond precision when comparing timestamps. * :code:`-A`: preserve ACLs. @@ -1067,22 +1067,43 @@ for a group of :code:`rsync(1)` options. This single option is one of: `_). :code:`+l` (default) - Preserves metadata and symlinks (if within source directory). Equivalent - to the options listed for :code:`+m` and also: + Like :code:`+u`, but *replaces* with their target “unsafe” symlinks whose + target is outside the top-of-transfer directory. Preserves: - * :code:`-l`: Copy symlinks as symlinks if their target is within the - top-of-transfer directory. This is *not the context directory and often - not the source directory*; see below for examples. + * Metadata. - * :code:`--copy-unsafe-links`: copy the target of symlinks that point - outside the top-of-transfer directory for each source. + * Directory structure. + + * Symlinks, if a link’s target is within the “top-of-transfer directory”. + *This is not the context directory and often not the source either.* + Also, this creates broken symlinks if the target is not within the + source but is within the top-of-transfer. See below for examples. + + Equivalent to the :code:`rsync(1)` options listed for :code:`+m` plus + :code:`--links` (copy symlinks as symlinks unless otherwise specified) and + :code:`--copy-unsafe-links` (copy the target of unsafe symlinks). + + :code:`+u` + Like :code:`+l`, but *silently skips* “unsafe” symlinks whose target is + outside the top-of-transfer directory. Preserves: + + * Metadata. + + * Directory structure. + + * Symlinks, if a link’s target is within the “top-of-transfer directory”. + *See important caveats above* under :code:`+l`. + + Equivalent to the :code:`rsync(1)` options listed for :code:`+m` plus + :code:`--links` (copy symlinks as symlinks unless otherwise specified) and + :code:`--safe-links` (silently skip unsafe symlinks). :code:`+z` No default arguments. Directories will not be descended, no metadata will be preserved, and both hard and symbolic links will be ignored, except as - specified by :code:`rsync(1)` options starting with a hyphen. (Note that - :code:`-a`/:code:`--archive` is discouraged because it omits some metadata - and handles symlinks inappropriately for containers.) + otherwise specified by :code:`rsync(1)` options starting with a hyphen. + (Note that :code:`-a`/:code:`--archive` is discouraged because it omits + some metadata and handles symlinks inappropriately for containers.) A small number of :code:`rsync(1)` features are actively disallowed: @@ -1153,6 +1174,8 @@ For example: **FIXME** +main gotchas: trailing slash, symlinks; rsync’s notion of unsafe symlinks + See the file :code:`50_rsync.bats` in the source code for more examples, including many corner cases. **FIXME**:: diff --git a/lib/build.py b/lib/build.py index 55e29f67c..c158c0d92 100644 --- a/lib/build.py +++ b/lib/build.py @@ -1248,13 +1248,15 @@ def str_(self): def execute(self): plus_options = list() - if (self.plus_option in ("m", "l")): # no action needed for +z + if (self.plus_option in "lmu"): # no action needed for +z # see man page for explanations plus_options = ["-@=-1", "-AHSXpr"] if (sys.stderr.isatty()): plus_options += ["--info=progress2"] if (self.plus_option == "l"): plus_options += ["-l", "--copy-unsafe-links"] + elif (self.plus_option == "u"): + plus_options += ["-l", "--safe-links"] ch.cmd(["rsync"] + plus_options + self.rsync_options_concise + self.srcs + [self.dst]) @@ -1286,7 +1288,7 @@ def prepare(self, miss_ct): def rsync_validate(self): # Reject bad + options. - if (self.plus_option not in ("m", "l", "z")): + if (self.plus_option not in ("mluz")): ch.FATAL("invalid plus option: %s" % self.plus_option) # Reject SSH and rsync transports. I *believe* simply the presence of # “:” (colon) in the filename triggers this behavior. diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index d06f3112b..4e8f036f4 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -8,14 +8,10 @@ tag=RSYNC setup () { scope standard [[ $CH_TEST_BUILDER = ch-image ]] || skip 'ch-image only' + umask 0007 fixtures=$BATS_TMPDIR/rsync context=$fixtures/ctx - context_out=$fixtures/ctx-out - context_doc=$fixtures/doc - if [[ -e $fixtures ]]; then - echo '## input ##' - ls_ "$fixtures" - fi + dst=$CH_IMAGE_STORAGE/img/tmpimg/dst } ls_ () { @@ -25,79 +21,96 @@ ls_ () { cd "$1" find . -mindepth 1 -printf '%M %n %3s%y %P -> %l\n' \ | LC_ALL=C sort -k4 \ - | sed -E -e 's|\w+/| |g' \ - -e 's| -> $||' \ - -e 's|([0-9]+)[f]|\1|' \ - -e 's|([0-9]+ )[0-9 ]+[a-z] |\1 |' \ - -e "s|$1|/...|" + | sed -E -e 's# ([[:alnum:]._-]+/){4}# #' \ + -e 's# ([[:alnum:]._-]+/){3}# #' \ + -e 's# ([[:alnum:]._-]+/){2}# #' \ + -e 's# ([[:alnum:]._-]+/){1}# #' \ + -e 's# -> $##' \ + -e 's#([0-9]+)[f]#\1#' \ + -e 's#([0-9]+ )[0-9 ]+[a-z] #\1 #' \ + -e "s#$1#/...#" ) } +ls_dump () { + target=$1 + out_basename=$2 + + ( cd "$target" && ls -oghR . > "${BATS_TMPDIR}/rsync_${out_basename}" ) +} + @test "${tag}: set up fixtures" { rm -Rf --one-file-system "$fixtures" mkdir "$fixtures" cd "$fixtures" - - # top level mkdir "$context" - echo file-top > file-top - # basic example + # outside context + echo file-out > file-out + mkdir dir-out + echo dir-out.file > dir-out/dir-out.file + + # top level of context cd "$context" + echo file-top > file-top + mkdir dir-top + echo dir-top.file > dir-top/dir-top.file + + # plain files and directories mkdir basic1 echo file-basic1 > basic1/file-basic1 chmod 607 basic1/file-basic1 # weird permissions mkdir basic2 echo file-basic2 > basic2/file-basic2 - # # outside context - # cd "$context_out" - # echo file-out > file-out - # mkdir dir-out - # echo dir-out.file-out > dir-out/dir-out.file-out - # ln -s file-out link.out2out - # cd .. - - # # inside context - # cd "$context" - # echo file-ctx > file-ctx - # mkdir src - # echo src.file > src/src.file - # mkdir src/src.dir - # echo src.dir.file > src/src.dir/src.dir.file - # mkdir not-src - # echo not-src.file > not-src.file - - # # symlinks to inside source - # cd src - # # to file - # ln -s src.file src.file_direct - # ln -s ../src/src.file src.file_up_over - # ln -s "$context"/src/src.file src.file_abs - # # relative to directory - # ln -s src.dir src.dir_direct - # ln -s ../src/src.dir src.dir_upover - # ln -s "$context"/src/src.dir src.dir_abs - - # # symlinks to outside source but inside context - # ln -s ../file-ctx file-ctx_up - # ln -s "$context"/file-ctx file-ctx_abs - # ln -s ../not-src not-src_up - # ln -s "$context"/not-src not-src_abs - - # # symlinks to outside context - # ln -s ../../ctx-out/file-out file-out_rel - # ln -s "$context_out"/file-out file-out_abs - # ln -s ../../ctx-out/dir-out dir-out_rel - # ln -s "$context_out"/dir-out dir-out_abs + # symlinks + cd "$context" + mkdir sym2 + echo file-sym2 > sym2/file-sym2 + mkdir sym1 + cd sym1 + echo file-sym1 > file-sym1 + mkdir dir-sym1 + echo dir-sym1.file > dir-sym1/dir-sym1.file + # target outside context + ln -s "$fixtures"/file-out file-out_abs + ln -s ../../file-out file-out_rel + ln -s ../../dir-out dir-out_rel + # target outside source (but inside context) + ln -s "$context"/file-top file-top_abs + ln -s ../file-top file-top_rel + ln -s ../dir-top dir-top_rel + # target inside source + ln -s "$context"/sym1/file-sym1 file-sym1_abs + ln -s file-sym1 file-sym1_direct + ln -s ../sym1/file-sym1 file-sym1_upover + ln -s dir-sym1 dir-sym1_direct + # target inside other source + ln -s "$context"/sym2/file-sym2 file-sym2_abs + ln -s ../sym2/file-sym2 file-sym2_upover + + # broken symlink + cd "$context" + mkdir sym-broken + cd sym-broken + ln -s doesnotexist doesnotexist_broken_direct + + # hard links + cd "$context" + mkdir hard + cd hard + echo hard-file > hard-file1 + ln hard-file1 hard-file2 echo "## created fixtures ##" ls_ "$fixtures" + ls_ "$fixtures" > $BATS_TMPDIR/rsync_fixtures-ls_ + ls_dump "$fixtures" fixtures } -@test "${tag}: basic examples" { - # files + +@test "${tag}: source: file(s)" { cat < "$ch_tmpimg_df" FROM alpine:3.17 RUN mkdir /dst @@ -112,19 +125,21 @@ RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom RSYNC /basic1/file-basic1 /dst/new/ EOF ch-image build -f "$ch_tmpimg_df" "$context" - ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) - run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + ls_dump "$dst" files + run ls_ "$dst" echo "$output" [[ $status -eq -0 ]] -cat < "$ch_tmpimg_df" FROM alpine:3.17 RUN mkdir /dst @@ -144,11 +159,11 @@ RSYNC +z /basic1 /dst/basic1_newC RSYNC +z -r /basic1/ /dst/basic1_newD EOF ch-image build -f "$ch_tmpimg_df" "$context" - ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) - run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + ls_dump "$dst" dir1 + run ls_ "$dst" echo "$output" [[ $status -eq -0 ]] -cat < "$ch_tmpimg_df" FROM alpine:3.17 RUN mkdir /dst @@ -185,13 +202,16 @@ RSYNC /basic*/ /dst/dstE # ... with one trailing slash and one not RUN mkdir /dst/dstF && echo file-dstF > /dst/dstF/file-dstF RSYNC /basic1 /basic2/ /dst/dstF +# ... replace (do not merge with) existing contents +RUN mkdir /dst/dstG && echo file-dstG > /dst/dstG/file-dstG +RSYNC --delete /basic*/ /dst/dstG EOF ch-image build --rebuild -f "$ch_tmpimg_df" "$context" - ( cd "$CH_IMAGE_STORAGE/img/tmpimg/dst" && ls -lhR . ) - run ls_ "$CH_IMAGE_STORAGE/img/tmpimg/dst" + ls_dump "$dst" dir2 + run ls_ "$dst" echo "$output" [[ $status -eq -0 ]] -cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1 /dst +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ls_dump "$dst" sym-default + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-sym1 +lrwxrwxrwx 1 dir-top_rel -> ../dir-top +-rw-rw---- 1 9 file-out_abs +-rw-rw---- 1 9 file-out_rel +-rw-rw---- 1 10 file-sym1 +-rw-rw---- 1 10 file-sym1_abs +lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 +-rw-rw---- 1 10 file-sym2_abs +lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 +-rw-rw---- 1 9 file-top_abs +lrwxrwxrwx 1 file-top_rel -> ../file-top +EOF +} + + +@test "${tag}: symlinks: default, source trailing slash" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1/ /dst/sym1 +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ls_dump "$dst" sym-slashed + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-sym1 +drwxrwx--- 1 dir-top_rel +-rw-rw---- 1 13 dir-top.file +-rw-rw---- 1 9 file-out_abs +-rw-rw---- 1 9 file-out_rel +-rw-rw---- 1 10 file-sym1 +-rw-rw---- 1 10 file-sym1_abs +lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +-rw-rw---- 1 10 file-sym1_upover +-rw-rw---- 1 10 file-sym2_abs +-rw-rw---- 1 10 file-sym2_upover +-rw-rw---- 1 9 file-top_abs +-rw-rw---- 1 9 file-top_rel +EOF +} + + +@test "${tag}: symlinks: +m" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC +m /sym1/ /dst/sym1 +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + [[ 12 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] + ls_dump "$dst" sym-m + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC +u /sym1/ /dst/sym1 +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + [[ 0 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] + ls_dump "$dst" sym-u + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-sym1 +-rw-rw---- 1 10 file-sym1 +lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +EOF +} + + +@test "${tag}: symlinks: between sources" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1 /sym2 /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-between + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-sym1 +lrwxrwxrwx 1 dir-top_rel -> ../dir-top +-rw-rw---- 1 9 file-out_abs +-rw-rw---- 1 9 file-out_rel +-rw-rw---- 1 10 file-sym1 +-rw-rw---- 1 10 file-sym1_abs +lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 +-rw-rw---- 1 10 file-sym2_abs +lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 +-rw-rw---- 1 9 file-top_abs +lrwxrwxrwx 1 file-top_rel -> ../file-top +drwxrwx--- 1 sym2 +-rw-rw---- 1 10 file-sym2 +EOF +} + + +@test "${tag}: symlinks: sources are symlinks to file" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1/file-sym1_direct /sym1/file-sym1_upover /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-to-file + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < file-sym1 +-rw-rw---- 1 10 file-sym1_upover +EOF +} + + +@test "${tag}: symlinks: source is symlink to directory" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1/dir-sym1_direct /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-to-dir + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-sym1 +EOF +} + + +@test "${tag}: symlinks: source is symlink to directory (trailing slash)" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym1/dir-sym1_direct/ /dst/dir-sym1_direct +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-to-dir-slashed + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /sym-broken /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-broken + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < doesnotexist +EOF +} + + +@test "${tag}: symlinks: broken (--copy-links)" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC +m --copy-links /sym-broken /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *"symlink has no referent: \"${context}/sym-broken/doesnotexist_broken_direct\""* ]] +} + + +@test "${tag}: symlinks: src file, dst symlink to file" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst && touch /dst/file-dst && ln -s file-dst /dst/file-dst_direct +RUN ls -lh /dst +RSYNC /file-top /dst/file-dst_direct +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-dst-symlink-file + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst \ + && mkdir /dst/dir-dst \ + && ln -s dir-dst /dst/dir-dst_direct \ + && ls -lh /dst +RSYNC /dir-top /dst/dir-dst_direct +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-dst-symlink-dir + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-dst +EOF +} + + +@test "${tag}: symlinks: src dir (slashed), dst symlink to dir" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst \ + && mkdir /dst/dir-dst \ + && ln -s dir-dst /dst/dir-dst_direct \ + && ls -lh /dst +RSYNC /dir-top/ /dst/dir-dst_direct +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-dst-symlink-dir-slashed + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-dst +EOF +} + + +@test "${tag}: hard links" { + inode_src=$(stat -c %i "$context"/hard/hard-file1) + [[ $inode_src -eq $(stat -c %i "$context"/hard/hard-file2) ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC /hard /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" hard + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < Date: Wed, 11 Oct 2023 17:55:24 -0600 Subject: [PATCH 12/20] whole bunch more docs, make --safe-links the default [skip ci] --- doc/ch-image.rst | 585 ++++++++++++++++++++++++++++++++++----- lib/build.py | 11 +- lib/filesystem.py | 6 +- test/build/50_rsync.bats | 192 ++++++++----- 4 files changed, 653 insertions(+), 141 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 354a5fcba..1585a7446 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1000,18 +1000,18 @@ Features we do not plan to support focus on that and will not introduce the volume management features that Docker has. -New instructions ----------------- - .. _ch-image_rsync: -:code:`RSYNC` -~~~~~~~~~~~~~ +:code:`RSYNC` (Dockerfile extension) +------------------------------------ .. warning:: This instruction is experimental and may change or be removed. +Overview +~~~~~~~~ + Copying files is often simple but has numerous difficult corner cases, e.g. when dealing with symbolic or hard links. The standard instruction :code:`COPY` deals with many of these corner cases differently from other UNIX @@ -1027,14 +1027,33 @@ current implementation literally calls the host’s :code:`rsync(1)` to do the copy, though this may change in the future. There is no list form of :code:`RSYNC`. +The two key usage challenges are trailing slashes on paths and symlink +handling. In particular, the default symlink handling seemed reasonable to us, +but you may want something different. See the arguments and examples below. +Importantly, :code:`COPY` is not any less fraught, and you have no choice +about what to do with symlinks. + + +Arguments +~~~~~~~~~ + :code:`RSYNC` takes the same arguments as :code:`rsync(1)`, so refer to its `man page `_ for a -detailed explanation of all the options. Sources are relative to the context -directory even if they look absolute with a leading slash. Any globbed sources -are processed by :code:`ch-image(1)` using Python rules, i.e., -:code:`rsync(1)` sees the expanded sources with no wildcards. Relative -destinations are relative to the image’s current working directory, while -absolute destinations refer to the image’s root. For example, +detailed explanation of all the options (with possible emphasis on its +`symlink options +`_). +Sources are relative to the context directory even if they look absolute with +a leading slash. Any globbed sources are processed by :code:`ch-image(1)` +using Python rules, i.e., :code:`rsync(1)` sees the expanded sources with no +wildcards. Relative destinations are relative to the image’s current working +directory, while absolute destinations refer to the image’s root. + +For arguments that read input from a file (e.g. :code:`--exclude-from` or +:code:`--files-from`), relative paths are relative to the context directory, +absolute paths refer to the image root, and :code:`-` (standard input) is an +error. + +For example, .. code-block:: docker @@ -1044,7 +1063,7 @@ absolute destinations refer to the image’s root. For example, is translated to (the equivalent of):: $ mkdir -p /foo - $ rsync -@=-1 -AHSXpr --info=progress2 -l --copy-unsafe-links \ + $ rsync -@=-1 -AHSXpr --info=progress2 -l --safe-links \ --foo /context/src1 /context/src2 /storage/imgroot/foo/dst2 Note the extensive default arguments to :code:`rsync(1)`. :code:`RSYNC` takes @@ -1067,36 +1086,39 @@ for a group of :code:`rsync(1)` options. This single option is one of: `_). :code:`+l` (default) - Like :code:`+u`, but *replaces* with their target “unsafe” symlinks whose - target is outside the top-of-transfer directory. Preserves: + Like :code:`+u`, but *silently skips* “unsafe” symlinks whose target is + outside the top-of-transfer directory. Preserves: * Metadata. * Directory structure. * Symlinks, if a link’s target is within the “top-of-transfer directory”. - *This is not the context directory and often not the source either.* - Also, this creates broken symlinks if the target is not within the - source but is within the top-of-transfer. See below for examples. + This is not the context directory and often not the source either. Also, + this creates broken symlinks if the target is not within the source but + is within the top-of-transfer. See examples below. Equivalent to the :code:`rsync(1)` options listed for :code:`+m` plus :code:`--links` (copy symlinks as symlinks unless otherwise specified) and - :code:`--copy-unsafe-links` (copy the target of unsafe symlinks). + :code:`--safe-links` (silently skip unsafe symlinks). :code:`+u` - Like :code:`+l`, but *silently skips* “unsafe” symlinks whose target is - outside the top-of-transfer directory. Preserves: + Like :code:`+l`, but *replaces* with their target “unsafe” symlinks whose + target is outside the top-of-transfer directory, and thus *can copy data + outside the context directory into the image*. Preserves: * Metadata. * Directory structure. * Symlinks, if a link’s target is within the “top-of-transfer directory”. - *See important caveats above* under :code:`+l`. + This is not the context directory and often not the source either. Also, + this creates broken symlinks if the target is not within the source but + is within the top-of-transfer. See examples below. Equivalent to the :code:`rsync(1)` options listed for :code:`+m` plus :code:`--links` (copy symlinks as symlinks unless otherwise specified) and - :code:`--safe-links` (silently skip unsafe symlinks). + :code:`--copy-unsafe-links` (copy the target of unsafe symlinks). :code:`+z` No default arguments. Directories will not be descended, no metadata will @@ -1105,6 +1127,15 @@ for a group of :code:`rsync(1)` options. This single option is one of: (Note that :code:`-a`/:code:`--archive` is discouraged because it omits some metadata and handles symlinks inappropriately for containers.) +.. note:: + + :code:`rsync(1)` supports a configuration file :code:`~/.popt` that alters + its command line processing. Currently, this configuration is respected for + :code:`RSYNC` arguments, but that may change without notice. + +Disallowed :code:`rsync(1)` features +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + A small number of :code:`rsync(1)` features are actively disallowed: 1. :code:`rsync:` and :code:`ssh:` transports are an error. Charliecloud @@ -1139,17 +1170,8 @@ A small number of :code:`rsync(1)` features are actively disallowed: Note that there are likely other flags that don’t make sense and/or cause undesirable behavior. We have not characterized this problem. -For arguments that read input from a file (e.g. :code:`--exclude-from` or -:code:`--files-from`), relative paths are relative to the context directory, -absolute paths refer to the image root, and :code:`-` (standard input) is an -error. - -Symlink handling is particularly fraught, so use care. The default handling -(see above) seemed reasonable to us, but you may want something different. See -:code:`rsync(1)`’s `normal symlink options -`_. -Importantly, :code:`COPY` is not any less fraught, and you have no choice -about what to do with symlinks. +Build cache +~~~~~~~~~~~ The instruction is a cache hit if the metadata of all source files is unchanged (specifically: filename, file type and permissions, xattrs, size, @@ -1164,42 +1186,468 @@ Notably, Charliecloud’s cache ignores :code:`rsync(1)`’s own internal notion of whether anything would be transferred (e.g., :code:`rsync -ni`). This may change in the future. -.. note:: +Examples and tutorial +~~~~~~~~~~~~~~~~~~~~~ - :code:`rsync(1)` supports a configuration file :code:`~/.popt` that alters - its command line processing. Currently, this configuration is respected for - :code:`RSYNC` arguments, but that may change without notice. +All of these examples use the same input, whose content will be introduced +gradually, using edited output of :code:`ls -oghR` (which is like :code:`ls +-lhR` but omits user and group). Examples assume a umask of :code:`0007`. The +Dockerfile instructions listed also assume a preceding: + +.. code-block:: docker + + FROM alpine:3.17 + RUN mkdir /dst + +i.e., a simple base image containing a top-level directory :code:`dst`. + +Many additional examples are available in the source code in the file +:code:`test/build/50_rsync.bats`. + +We begin by copying regular files. The context directory :code:`ctx` contains, +in part, two directories containing one regular file each. Note that one of +these files (:code:`file-basic1`) and one of the directories (:code:`basic1`) +have strange permissions. + +:: + + ./ctx: + drwx---r-x 2 60 Oct 11 13:20 basic1 + drwxrwx--- 2 60 Oct 11 13:20 basic2 + + ./ctx/basic1: + -rw----r-- 1 12 Oct 11 13:20 file-basic1 + + ./ctx/basic2: + -rw-rw---- 1 12 Oct 11 13:20 file-basic2 + +The simplest form of :code:`RSYNC` is to copy a single file into a specified +directory: + +.. code-block:: docker + + RSYNC /basic1/file-basic1 /dst + +resulting in:: + + $ ls -oghR dst + dst: + -rw----r-- 1 12 Oct 11 13:26 file-basic1 + +Note that :code:`file-basic1`’s metadata — here its odd permissions — are +preserved. :code:`1` is the number of hard links to the file, and :code:`12` +is the file size. + +One can also rename the destination by specifying a new file name, and with +:code:`+z`, not copy metadata (from here on the :code:`ls` command is omitted +for brevity): + +.. code-block:: docker + + RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom + +:: + + dst: + -rw------- 1 12 Sep 21 15:51 file-basic1_nom + +A trailing slash on the destination creates a new directory and places the +source file within: + +.. code-block:: docker + + RSYNC /basic1/file-basic1 /dst/new/ + +:: + + dst: + drwxrwx--- 1 22 Oct 11 13:26 new + + dst/new: + -rw----r-- 1 12 Oct 11 13:26 file-basic1 + +With multiple source files, the destination trailing slash is optional: + +.. code-block:: docker + + RSYNC /basic1/file-basic1 /basic2/file-basic2 /dst/newB + +:: + + dst: + drwxrwx--- 1 44 Oct 11 13:26 newB + + dst/newB: + -rw----r-- 1 12 Oct 11 13:26 file-basic1 + -rw-rw---- 1 12 Oct 11 13:26 file-basic2 + +For directory sources, the presence or absence of a trailing slash is highly +significant. Without one, the directory itself is placed in the destination +(recall that this would rename a source *file*): + +.. code-block:: docker + + RSYNC /basic1 /dst/basic1_new + +:: + + dst: + drwxrwx--- 1 12 Oct 11 13:28 basic1_new + + dst/basic1_new: + drwx---r-x 1 22 Oct 11 13:28 basic1 + + dst/basic1_new/basic1: + -rw----r-- 1 12 Oct 11 13:28 file-basic1 + +A source trailing slash means copy the *contents of* a directory rather than +the directory itself. Importantly, however, the directory’s metadata is copied +to the destination directory. + +.. code-block:: docker + + RSYNC /basic1/ /dst/basic1_renamed + +:: + + dst: + drwx---r-x 1 22 Oct 11 13:28 basic1_renamed + + dst/basic1_renamed: + -rw----r-- 1 12 Oct 11 13:28 file-basic1 + +One gotcha is that :code:`RSYNC +z` is a no-op if the source is a directory: + +.. code-block:: docker + + RSYNC +z /basic1 /dst/basic1_newC + +:: + + dst: + +At least :code:`-r` is needed with :code:`+z` in this case: + +.. code-block:: docker + + RSYNC +z -r /basic1/ /dst/basic1_newD + +:: + + dst: + drwx------ 1 22 Oct 11 13:28 basic1_newD + + dst/basic1_newD: + -rw------- 1 12 Oct 11 13:28 file-basic1 + +Multiple source directories can be specified, including with wildcards. This +example also illustrates that copies files are by default merged with content +already existing in the image. + +.. code-block:: docker + + RUN mkdir /dst/dstC && echo file-dstC > /dst/dstC/file-dstC + RSYNC /basic* /dst/dstC + +:: + + dst: + drwxrwx--- 1 42 Oct 11 13:33 dstC + + dst/dstC: + drwx---r-x 1 22 Oct 11 13:33 basic1 + drwxrwx--- 1 22 Oct 11 13:33 basic2 + -rw-rw---- 1 10 Oct 11 13:33 file-dstC + + dst/dstC/basic1: + -rw----r-- 1 12 Oct 11 13:33 file-basic1 + + dst/dstC/basic2: + -rw-rw---- 1 12 Oct 11 13:33 file-basic2 + +Trailing slashes can be specified independently for each source: + +.. code-block:: docker + + RUN mkdir /dst/dstF && echo file-dstF > /dst/dstF/file-dstF + RSYNC /basic1 /basic2/ /dst/dstF + +:: + + dst: + drwxrwx--- 1 52 Oct 11 13:33 dstF + + dst/dstF: + drwx---r-x 1 22 Oct 11 13:33 basic1 + -rw-rw---- 1 12 Oct 11 13:33 file-basic2 + -rw-rw---- 1 10 Oct 11 13:33 file-dstF + + dst/dstF/basic1: + -rw----r-- 1 12 Oct 11 13:33 file-basic1 + +Bare :code:`/` (i.e., the entire context directory) is considered to have a +trailing slash: + +.. code-block:: docker + + RSYNC / /dst + +:: + + dst: + drwx---r-x 1 22 Oct 11 13:33 basic1 + drwxrwx--- 1 22 Oct 11 13:33 basic2 + + dst/basic1: + -rw----r-- 1 12 Oct 11 13:33 file-basic1 + + dst/basic2: + -rw-rw---- 1 12 Oct 11 13:33 file-basic2 + +To *replace* (rather than merge with) existing content, use :code:`--delete`. +Note also that wildcards can be combined with trailing slashes and that the +directory gets the metadata of the *first* slashed directory. + +.. code-block:: docker + + RUN mkdir /dst/dstG && echo file-dstG > /dst/dstG/file-dstG + RSYNC --delete /basic*/ /dst/dstG + +:: + + dst: + drwx---r-x 1 44 Oct 11 14:00 dstG + + dst/dstG: + -rw----r-- 1 12 Oct 11 14:00 file-basic1 + -rw-rw---- 1 12 Oct 11 14:00 file-basic2 + +Symbolic links in the source(s) add significant complexity. Like +:code:`rsync(1)`, :code:`RSYNC` can do one of three things with a given +symlink: + +1. Ignore it, silently or with a warning. + +2. Preserve it: copy as a symlink, with the same target. + +3. Dereference it: copy the target instead. + +These actions are selected independently for *safe symlinks* and *unsafe +symlinks*. Safe symlinks are those which point to a target within the *top of +transfer*, which is the deepest directory in the source path with a trailing +slash. For example, :code:`/foo/bar`’s top-of-transfer is :code:`/foo` +(regardless of whether :code:`bar` is a directory or file), while +:code:`/foo/bar/`’s top-of-transfer is :code:`/foo/bar`. + +For the symlink examples, the context contains two sub-directories with a +variety of symlinks, as well as a sibling file and directory outside the +context. All of these links are valid on the host. In this listing, the +absolute path to the parent of the context directory is replaced with +:code:`/...`. + +:: + + .: + drwxrwx--- 9 200 Oct 11 14:00 ctx + drwxrwx--- 2 60 Oct 11 14:00 dir-out + -rw-rw---- 1 9 Oct 11 14:00 file-out + + ./ctx: + drwxrwx--- 3 320 Oct 11 14:00 sym1 + + ./ctx/sym1: + lrwxrwxrwx 1 13 Oct 11 14:00 dir-out_rel -> ../../dir-out + drwxrwx--- 2 60 Oct 11 14:00 dir-sym1 + lrwxrwxrwx 1 8 Oct 11 14:00 dir-sym1_direct -> dir-sym1 + lrwxrwxrwx 1 10 Oct 11 14:00 dir-top_rel -> ../dir-top + lrwxrwxrwx 1 47 Oct 11 14:00 file-out_abs -> /.../file-out + lrwxrwxrwx 1 14 Oct 11 14:00 file-out_rel -> ../../file-out + -rw-rw---- 1 10 Oct 11 14:00 file-sym1 + lrwxrwxrwx 1 57 Oct 11 14:00 file-sym1_abs -> /.../ctx/sym1/file-sym1 + lrwxrwxrwx 1 9 Oct 11 14:00 file-sym1_direct -> file-sym1 + lrwxrwxrwx 1 17 Oct 11 14:00 file-sym1_upover -> ../sym1/file-sym1 + lrwxrwxrwx 1 51 Oct 11 14:00 file-top_abs -> /.../ctx/file-top + lrwxrwxrwx 1 11 Oct 11 14:00 file-top_rel -> ../file-top + + ./ctx/sym1/dir-sym1: + -rw-rw---- 1 14 Oct 11 14:00 dir-sym1.file + + ./dir-out: + -rw-rw---- 1 13 Oct 11 14:00 dir-out.file + +By default, safe symlinks are preserved while unsafe symlinks are silently +ignored: + +.. code-block:: docker + + RSYNC /sym1 /dst + +:: + + dst: + drwxrwx--- 1 206 Oct 11 17:10 sym1 + + dst/sym1: + drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 + lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 + lrwxrwxrwx 1 10 Oct 11 17:10 dir-top_rel -> ../dir-top + -rw-rw---- 1 10 Oct 11 17:10 file-sym1 + lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 + lrwxrwxrwx 1 17 Oct 11 17:10 file-sym1_upover -> ../sym1/file-sym1 + lrwxrwxrwx 1 17 Oct 11 17:10 file-sym2_upover -> ../sym2/file-sym2 + lrwxrwxrwx 1 11 Oct 11 17:10 file-top_rel -> ../file-top + + dst/sym1/dir-sym1: + -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file + +The source files have four rough fates: + +1. Regular files and directories (:code:`file-sym1` and :code:`dir-sym1`). + These are copied into the image unchanged, including metadata. + +2. Safe symlinks, now broken. This is one of the gotchas of :code:`RSYNC`’s + top-of-transfer directory (here host path :code:`./ctx`, image path + :code:`/`) differing from the source directory (:code:`./ctx/sym1`, + :code:`/sym1`), because the latter lacks a trailing slash. + :code:`dir-top_rel`, :code:`file-sym2_upover`, and :code:`file-top_rel` all + ascend only as high as :code:`./ctx` (host path, :code:`/` image) before + re-descending. This is within the top-of-transfer, so the symlinks are safe + and thus copied unchanged, but their targets were not included in the copy. + +3. Safe symlinks, still valid. + + 1. :code:`dir-sym1_direct` and :code:`file-sym1_direct` point directly to + files in the same directory. + + 2. :code:`dir-sym1_upover` and :code:`file-sym1_upover` point to files in + the same directory, but by first ascending into their parent — within + the top-of-transfer, so they are safe — and then re-descending. If + :code:`sym1` were renamed during the copy, these links would break. + +4. Unsafe symlinks, which are ignored by the copy and do not appear in the + image. + + 1. Absolute symlinks are always unsafe (:code:`*_abs`). + + 2. :code:`dir-out_rel` and :code:`file-out_rel` are relative symlinks that + ascend above the top-of-transfer, in this case to targets outside the + context, and are thus unsafe. + +The top-of-transfer can be changed to :code:`sym1` with a trailing slash. This +also adds :code:`sym1` to the destination so the resulting directory structure +is the same. + +.. code-block:: docker + + RSYNC /sym1/ /dst/sym1 + +:: + + dst: + drwxrwx--- 1 96 Oct 11 17:10 sym1 + + dst/sym1: + drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 + lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 + -rw-rw---- 1 10 Oct 11 17:10 file-sym1 + lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 + + dst/sym1/dir-sym1: + -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file + +:code:`*_upover` and :code:`*-out_rel` are now unsafe and replaced with their +targets. + +Another common use case is to follow unsafe symlinks and copy their targets in +place of the links. This is accomplished with :code:`+u`: + +.. code-block:: docker + + RSYNC +u /sym1/ /dst/sym1 + +:: + + dst: + drwxrwx--- 1 352 Oct 11 17:10 sym1 + + dst/sym1: + drwxrwx--- 1 24 Oct 11 17:10 dir-out_rel + drwxrwx--- 1 26 Oct 11 17:10 dir-sym1 + lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1 + drwxrwx--- 1 24 Oct 11 17:10 dir-top_rel + -rw-rw---- 1 9 Oct 11 17:10 file-out_abs + -rw-rw---- 1 9 Oct 11 17:10 file-out_rel + -rw-rw---- 1 10 Oct 11 17:10 file-sym1 + -rw-rw---- 1 10 Oct 11 17:10 file-sym1_abs + lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 + -rw-rw---- 1 10 Oct 11 17:10 file-sym1_upover + -rw-rw---- 1 10 Oct 11 17:10 file-sym2_abs + -rw-rw---- 1 10 Oct 11 17:10 file-sym2_upover + -rw-rw---- 1 9 Oct 11 17:10 file-top_abs + -rw-rw---- 1 9 Oct 11 17:10 file-top_rel + + dst/sym1/dir-out_rel: + -rw-rw---- 1 13 Oct 11 17:10 dir-out.file + + dst/sym1/dir-sym1: + -rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file + + dst/sym1/dir-top_rel: + -rw-rw---- 1 13 Oct 11 17:10 dir-top.file + +Now all the unsafe symlinks noted above are present in the image, but they +have changed to the normal files and directories pointed to. + +.. warning:: + + This feature lets you copy files outside the context into the image, unlike + other container builders where :code:`COPY` can never access anything + outside the context. + +The sources themselves, if symlinks, do not get special treatment: + +.. code-block:: + + RSYNC /sym1/file-sym1_direct /sym1/file-sym1_upover /dst + +:: + + dst: + lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1 + +Note that :code:`file-sym1_upover` does not appear in the image, despite being +named explicitly in the instruction, because it is an unsafe symlink. + +If the *destination* is a symlink to a file, and the source is a file, the +link is replaced and the target is unchanged. (If the source is a directory, +that is an error.) + +.. code-block:: docker + + RUN touch /dst/file-dst && ln -s file-dst /dst/file-dst_direct + RSYNC /file-top /dst/file-dst_direct + +:: + + dst: + -rw-rw---- 1 0 Oct 11 17:42 file-dst + -rw-rw---- 1 9 Oct 11 17:42 file-dst_direct + +If the destination is a symlink to a directory, the link is followed: + +.. code-block:: docker + + RUN mkdir /dst/dir-dst && ln -s dir-dst /dst/dir-dst_direct + RSYNC /file-top /dst/dir-dst_direct + +:: + + dst: + drwxrwx--- 1 16 Oct 11 17:50 dir-dst + lrwxrwxrwx 1 7 Oct 11 17:50 dir-dst_direct -> dir-dst -For example: - -**FIXME** - -main gotchas: trailing slash, symlinks; rsync’s notion of unsafe symlinks - -See the file :code:`50_rsync.bats` in the source code for more examples, including many corner cases. - -**FIXME**:: - - examples - context directory in full - to new directory - contents to / - something about no + vs + vs +L, trailing slash - absolute inside source dir - points to inside source dir - relative inside source dir - points to inside source dir directly - points to inside source dir with ../sourcedir - points to inside context dir but not source dir - source is symlink - relative - target in same directory - target in different directory - absolute - target in same directory - target in different directory - ssh: transport - rsync: transport + dst/dir-dst: + -rw-rw---- 1 9 Oct 11 17:50 file-top Examples -------- @@ -1637,4 +2085,5 @@ Environment variables .. LocalWords: tmpfs'es bigvendor AUTH auth bucache buc bigfile df rfc bae .. LocalWords: dlcache graphviz packfile packfiles bigFileThreshold fd Tpdf .. LocalWords: pstats gprof chofile cffd cacdb ARGs NSYNC dst imgroot popt -.. LocalWords: globbed ni AHSXpr +.. LocalWords: globbed ni AHSXpr drwxrwx ctx sym nom newB newC newD dstC +.. LocalWords: dstB dstF dstG upover drwx diff --git a/lib/build.py b/lib/build.py index c158c0d92..53e39d8ce 100644 --- a/lib/build.py +++ b/lib/build.py @@ -590,10 +590,9 @@ def expand_sources(self): self.srcs_base_set() self.srcs = list() for src in (ch.variables_sub(i, self.env_build) for i in self.srcs_raw): - ts_p = src[1] == "/" # glob can’t take Path - matches = [fs.Path(i) - for i in glob.glob("%s/%s" % (self.srcs_base, src))] + matches = sorted(fs.Path(i) + for i in glob.glob("%s/%s" % (self.srcs_base, src))) if (len(matches) == 0): ch.FATAL("source not found: %s" % src) for m in matches: @@ -603,8 +602,6 @@ def expand_sources(self): # as given later, so don’t canonicalize persistently.) There is no # clear subsitute for commonpath() in pathlib. mc = m.resolve() - m.trailing_slash_p - mc.trailing_slash_p if (not os.path.commonpath([mc, self.srcs_base]) .startswith(self.srcs_base)): ch.FATAL("can’t copy from outside context: %s" % src) @@ -1254,9 +1251,9 @@ def execute(self): if (sys.stderr.isatty()): plus_options += ["--info=progress2"] if (self.plus_option == "l"): - plus_options += ["-l", "--copy-unsafe-links"] - elif (self.plus_option == "u"): plus_options += ["-l", "--safe-links"] + elif (self.plus_option == "u"): + plus_options += ["-l", "--copy-unsafe-links"] ch.cmd(["rsync"] + plus_options + self.rsync_options_concise + self.srcs + [self.dst]) diff --git a/lib/filesystem.py b/lib/filesystem.py index 8a0bfc6db..fb1bf8474 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -39,8 +39,10 @@ class Path(pathlib.PosixPath): """Filesystem paths with two important differences from the stock Path: 1. This Path remembers whether a trailing slash is present, and appends - it when str() or repr(). Note that Path("/") is considered *not* to - have a trailing slash. + it when str() or repr(). Note that while Path("/") is considered + internally *not* to have a trailing slash, when stringified (as "/") + and passed to rsync(1) on its command line, rsync(1) *does* interpret + it as trailing-slashed. 2. Stock Path objects have the very weird property that appending an absolute path to an existing path ignores the left operand, leaving diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index 4e8f036f4..f41fd433b 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -34,9 +34,12 @@ ls_ () { ls_dump () { target=$1 + target_basename=$(basename "$target") + target_parent=$(dirname "$target") out_basename=$2 - ( cd "$target" && ls -oghR . > "${BATS_TMPDIR}/rsync_${out_basename}" ) + ( cd "$target_parent" \ + && ls -oghR "$target_basename" > "${BATS_TMPDIR}/rsync_${out_basename}" ) } @@ -59,8 +62,9 @@ ls_dump () { # plain files and directories mkdir basic1 + chmod 705 basic1 # weird permissions echo file-basic1 > basic1/file-basic1 - chmod 607 basic1/file-basic1 # weird permissions + chmod 604 basic1/file-basic1 # weird permissions mkdir basic2 echo file-basic2 > basic2/file-basic2 @@ -123,18 +127,23 @@ RSYNC /basic1/file-basic1 /dst/file-basic1_renamed RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom # ... with trailing slash on *destination* RSYNC /basic1/file-basic1 /dst/new/ +# multiple files +RSYNC /basic1/file-basic1 /basic2/file-basic2 /dst/newB EOF - ch-image build -f "$ch_tmpimg_df" "$context" + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" ls_dump "$dst" files run ls_ "$dst" echo "$output" [[ $status -eq -0 ]] cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst + +RSYNC / /dst +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ls_dump "$dst" dir-root + run ls_ "$dst" + echo "$output" + [[ $status -eq -0 ]] + cat < doesnotexist +drwxrwx--- 1 sym1 +drwxrwx--- 1 dir-sym1 +-rw-rw---- 1 14 dir-sym1.file +lrwxrwxrwx 1 dir-sym1_direct -> dir-sym1 +lrwxrwxrwx 1 dir-top_rel -> ../dir-top +-rw-rw---- 1 10 file-sym1 +lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 +lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 +lrwxrwxrwx 1 file-top_rel -> ../file-top +drwxrwx--- 1 sym2 +-rw-rw---- 1 10 file-sym2 +EOF +} + + @test "${tag}: symlinks: default" { cat < "$ch_tmpimg_df" FROM alpine:3.17 RUN mkdir /dst RSYNC /sym1 /dst EOF - ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + [[ 0 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] ls_dump "$dst" sym-default run ls_ "$dst" echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 lrwxrwxrwx 1 dir-top_rel -> ../dir-top --rw-rw---- 1 9 file-out_abs --rw-rw---- 1 9 file-out_rel -rw-rw---- 1 10 file-sym1 --rw-rw---- 1 10 file-sym1_abs lrwxrwxrwx 1 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 --rw-rw---- 1 10 file-sym2_abs lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 --rw-rw---- 1 9 file-top_abs lrwxrwxrwx 1 file-top_rel -> ../file-top EOF } @@ -290,23 +336,11 @@ EOF [[ $status -eq 0 ]] cat < dir-sym1 -drwxrwx--- 1 dir-top_rel --rw-rw---- 1 13 dir-top.file --rw-rw---- 1 9 file-out_abs --rw-rw---- 1 9 file-out_rel -rw-rw---- 1 10 file-sym1 --rw-rw---- 1 10 file-sym1_abs lrwxrwxrwx 1 file-sym1_direct -> file-sym1 --rw-rw---- 1 10 file-sym1_upover --rw-rw---- 1 10 file-sym2_abs --rw-rw---- 1 10 file-sym2_upover --rw-rw---- 1 9 file-top_abs --rw-rw---- 1 9 file-top_rel EOF } @@ -340,21 +374,30 @@ FROM alpine:3.17 RUN mkdir /dst RSYNC +u /sym1/ /dst/sym1 EOF - run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" - echo "$output" - [[ $status -eq 0 ]] - [[ 0 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" ls_dump "$dst" sym-u run ls_ "$dst" echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 +drwxrwx--- 1 dir-top_rel +-rw-rw---- 1 13 dir-top.file +-rw-rw---- 1 9 file-out_abs +-rw-rw---- 1 9 file-out_rel -rw-rw---- 1 10 file-sym1 +-rw-rw---- 1 10 file-sym1_abs lrwxrwxrwx 1 file-sym1_direct -> file-sym1 +-rw-rw---- 1 10 file-sym1_upover +-rw-rw---- 1 10 file-sym2_abs +-rw-rw---- 1 10 file-sym2_upover +-rw-rw---- 1 9 file-top_abs +-rw-rw---- 1 9 file-top_rel EOF } @@ -374,21 +417,14 @@ EOF [[ $status -eq 0 ]] cat < dir-sym1 lrwxrwxrwx 1 dir-top_rel -> ../dir-top --rw-rw---- 1 9 file-out_abs --rw-rw---- 1 9 file-out_rel -rw-rw---- 1 10 file-sym1 --rw-rw---- 1 10 file-sym1_abs lrwxrwxrwx 1 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 --rw-rw---- 1 10 file-sym2_abs lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 --rw-rw---- 1 9 file-top_abs lrwxrwxrwx 1 file-top_rel -> ../file-top drwxrwx--- 1 sym2 -rw-rw---- 1 10 file-sym2 @@ -411,7 +447,6 @@ EOF [[ $status -eq 0 ]] cat < file-sym1 --rw-rw---- 1 10 file-sym1_upover EOF } @@ -491,8 +526,10 @@ EOF @test "${tag}: symlinks: src file, dst symlink to file" { cat < "$ch_tmpimg_df" FROM alpine:3.17 -RUN mkdir /dst && touch /dst/file-dst && ln -s file-dst /dst/file-dst_direct -RUN ls -lh /dst +RUN mkdir /dst \ + && touch /dst/file-dst \ + && ln -s file-dst /dst/file-dst_direct \ + && ls -lh /dst RSYNC /file-top /dst/file-dst_direct EOF run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" @@ -509,6 +546,30 @@ EOF } +@test "${tag}: symlinks: src file, dst symlink to dir" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst \ + && mkdir /dst/dir-dst \ + && ln -s dir-dst /dst/dir-dst_direct \ + && ls -lh /dst +RSYNC /file-top /dst/dir-dst_direct +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 0 ]] + ls_dump "$dst" sym-dst-symlink-dir-src-file + run ls_ "$dst" + echo "$output" + [[ $status -eq 0 ]] + cat < dir-dst +EOF +} + + @test "${tag}: symlinks: src dir, dst symlink to dir" { cat < "$ch_tmpimg_df" FROM alpine:3.17 @@ -584,3 +645,6 @@ EOF [[ $inode_src -ne $inode_dst ]] } + +#FIXME: relative source path +#FIXME: relative dest path From 203fcd399157df688f691cba1ad0b91fdd1a15aa Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Thu, 12 Oct 2023 15:44:29 -0600 Subject: [PATCH 13/20] fix relative paths [skip ci] --- lib/build.py | 9 +++++++-- test/build/50_rsync.bats | 17 +++++++++++++++-- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/lib/build.py b/lib/build.py index 53e39d8ce..5bca77d3a 100644 --- a/lib/build.py +++ b/lib/build.py @@ -576,8 +576,13 @@ class Copy(Instruction): def expand_dest(self): """Set self.dst from self.dst_raw with environment variables expanded and image root prepended.""" - self.dst = self.image.unpack_path \ - // ch.variables_sub(self.dst_raw, self.env_build) + dst_raw = ch.variables_sub(self.dst_raw, self.env_build) + if (len(dst_raw) < 1): + ch.FATAL("destination is empty after expansion: %s" % self.dst_raw) + base = self.image.unpack_path + if (dst_raw[0] != "/"): + base //= self.workdir + self.dst = base // ch.variables_sub(self.dst_raw, self.env_build) def expand_sources(self): """Set self.srcs from self.srcs_raw with environment variables and globs diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index f41fd433b..71ae90f56 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -646,5 +646,18 @@ EOF } -#FIXME: relative source path -#FIXME: relative dest path +@test "${tag}: relative paths" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +WORKDIR /dst +RSYNC file-basic1 . +EOF + ch-image build -v --rebuild -f "$ch_tmpimg_df" "$context"/basic1 + ls_dump "$dst" files + run ls_ "$dst" + echo "$output" + [[ $status -eq -0 ]] + cat < Date: Fri, 13 Oct 2023 13:25:20 -0600 Subject: [PATCH 14/20] more tests [skip ci] --- lib/build.py | 4 +- test/build/50_rsync.bats | 121 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 117 insertions(+), 8 deletions(-) diff --git a/lib/build.py b/lib/build.py index 5bca77d3a..291145f7d 100644 --- a/lib/build.py +++ b/lib/build.py @@ -1270,6 +1270,8 @@ def expand_rsync_froms(self): key = m[1] if (m[2] == "-"): ch.FATAL("--*-from: can’t use standard input") + elif (":" in m[2]): + ch.FATAL("--*-from: can’t use remote hosts (colon in path)") path = ch.Path(m[2]) if (path.is_absolute()): path = self.image.unpack_path // path @@ -1303,7 +1305,7 @@ def rsync_validate(self): "--remove-source-files" } for o in self.rsync_options: if (o in bad): - ch.FATAL("bad option: %s" % o) + ch.FATAL("disallowed option: %s" % o) class Run(Instruction): diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index 71ae90f56..3ec215818 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -56,7 +56,7 @@ ls_dump () { # top level of context cd "$context" - echo file-top > file-top + printf 'basic1/file-basic1\nbasic2\n' > file-top # also list of files mkdir dir-top echo dir-top.file > dir-top/dir-top.file @@ -272,7 +272,7 @@ drwxrwx--- 1 basic2 -rw-rw---- 1 12 file-basic2 drwxrwx--- 1 dir-top -rw-rw---- 1 13 dir-top.file --rw-rw---- 1 9 file-top +-rw-rw---- 1 26 file-top drwxrwx--- 1 hard -rw-rw---- 2 10 hard-file1 -rw-rw---- 2 10 hard-file2 @@ -396,8 +396,8 @@ lrwxrwxrwx 1 file-sym1_direct -> file-sym1 -rw-rw---- 1 10 file-sym1_upover -rw-rw---- 1 10 file-sym2_abs -rw-rw---- 1 10 file-sym2_upover --rw-rw---- 1 9 file-top_abs --rw-rw---- 1 9 file-top_rel +-rw-rw---- 1 26 file-top_abs +-rw-rw---- 1 26 file-top_rel EOF } @@ -541,7 +541,7 @@ EOF [[ $status -eq 0 ]] cat < dir-dst EOF } @@ -652,12 +652,119 @@ FROM alpine:3.17 WORKDIR /dst RSYNC file-basic1 . EOF - ch-image build -v --rebuild -f "$ch_tmpimg_df" "$context"/basic1 + ch-image build --rebuild -f "$ch_tmpimg_df" "$context"/basic1 ls_dump "$dst" files run ls_ "$dst" echo "$output" [[ $status -eq -0 ]] cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC foo bar +EOF + run ch-image build -t tmpimg - < "$ch_tmpimg_df" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: no context'* ]] +} + + +@test "${tag}: bad + option" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC +y foo bar +EOF + run ch-image build -t tmpimg - < "$ch_tmpimg_df" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: invalid plus option: y'* ]] +} + + +@test "${tag}: remote transports" { + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC foo://bar baz +EOF + run ch-image build -t tmpimg - < "$ch_tmpimg_df" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: SSH and rsync transports not supported'* ]] +} + + +@test "${tag}: excluded options" { + # We only test one of them, for DRY, though I did pick the one that seemed + # most dangerous. + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC --remove-source-files foo bar +EOF + run ch-image build -t tmpimg - < "$ch_tmpimg_df" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: disallowed option: --remove-source-files'* ]] + +} + +@test "${tag}: --*-from translation" { + # relative (context) + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RSYNC --files-from=./file-top / /dst +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ls_dump "$dst" files + run ls_ "$dst" + echo "$output" + [[ $status -eq -0 ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RUN mkdir /dst +RUN printf 'file-top\n' > /fls +RSYNC --files-from=/fls / /dst +EOF + ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + ls_dump "$dst" files + run ls_ "$dst" + echo "$output" + [[ $status -eq -0 ]] + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC --files-from=- / /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: --*-from: can'?'t use standard input'* ]] + + # colon disallowed + cat < "$ch_tmpimg_df" +FROM alpine:3.17 +RSYNC --files-from=foo:bar / /dst +EOF + run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" + echo "$output" + [[ $status -eq 1 ]] + [[ $output = *'error: --*-from: can'?'t use remote hosts'* ]] } From f66af9c9391a9553c495ce72779604fa8a16150e Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 13 Oct 2023 14:35:39 -0600 Subject: [PATCH 15/20] fix RSYNC caching [skip ci] --- lib/build.py | 14 ++---- lib/filesystem.py | 5 ++ test/Makefile.am | 1 + test/bucache/rsync.df | 3 ++ test/build/55_cache.bats | 98 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 112 insertions(+), 9 deletions(-) create mode 100644 test/bucache/rsync.df diff --git a/lib/build.py b/lib/build.py index 291145f7d..3497596c6 100644 --- a/lib/build.py +++ b/lib/build.py @@ -573,6 +573,10 @@ class Copy(Instruction): "srcs_base", "srcs_raw") + @property + def sid_input(self): + return super().sid_input + self.src_metadata + def expand_dest(self): """Set self.dst from self.dst_raw with environment variables expanded and image root prepended.""" @@ -773,10 +777,6 @@ def __init__(self, *args): self.srcs_raw = args[:-1] self.dst_raw = args[-1] - @property - def sid_input(self): - return super().sid_input + self.src_metadata - @property def str_(self): dst = repr(self.dst) if hasattr(self, "dst") else self.dst_raw @@ -947,7 +947,7 @@ def prepare(self, miss_ct): self.options_assert_empty() # Expand operands. self.expand_sources() - self.expand_dest() + self.dst = ch.variables_sub(self.dst_raw, self.env_build) # Gather metadata for hashing. self.src_metadata = fs.Path.stat_bytes_all(self.srcs) # Pass on to superclass. @@ -1233,10 +1233,6 @@ def ship_out(): ship_out() return ret - @property - def sid_input(self): - return super().sid_input # FIXME - seems hard? - @property def str_(self): ret = list() diff --git a/lib/filesystem.py b/lib/filesystem.py index fb1bf8474..d221c60cc 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -424,6 +424,11 @@ def open_(self, mode, *args, **kwargs): "can’t open for %s: %s" % (mode, self.name), mode, *args, **kwargs) + def relative_to(self, other): # FIXME: does not support component-wise + ret = super().relative_to(other) + ret.trailing_slash_p = other.trailing_slash_p + return ret + def rename_(self, name_new): if (Path(name_new).exists()): ch.FATAL("can’t rename: destination exists: %s" % name_new) diff --git a/test/Makefile.am b/test/Makefile.am index 9fd715ead..3ae062095 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -17,6 +17,7 @@ bucache/copy.df \ bucache/difficult.df \ bucache/force.df \ bucache/from.df \ +bucache/rsync.df \ build/10_sanity.bats \ build/40_pull.bats \ build/50_ch-image.bats \ diff --git a/test/bucache/rsync.df b/test/bucache/rsync.df new file mode 100644 index 000000000..d4b562199 --- /dev/null +++ b/test/bucache/rsync.df @@ -0,0 +1,3 @@ +# Context directory must be fixtures prepared in 55_cache.bats:RSYNC. +FROM alpine:3.17 +RSYNC /* / diff --git a/test/build/55_cache.bats b/test/build/55_cache.bats index f334146c7..4837c5ddc 100644 --- a/test/build/55_cache.bats +++ b/test/build/55_cache.bats @@ -900,6 +900,104 @@ EOF } +@test "${tag}: RSYNC" { + ch-image build-cache --reset + + # Prepare fixtures. These need various manipulating during the test, which + # is why they’re built here on the fly. + fixtures=${BATS_TMPDIR}/rsync-cache + mkdir -p "$fixtures" + echo hello > "$fixtures"/file1 + rm -f "$fixtures"/file1a + mkdir -p "$fixtures"/dir1 + touch "$fixtures"/dir1/file1 "$fixtures"/dir1/file2 + + printf '\n*** Build; all misses.\n\n' + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'. FROM'* ]] + [[ $output = *'. RSYNC'* ]] + + printf '\n*** Re-build; all hits.\n\n' + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] + + printf '\n*** Add remove file in directory; should miss b/c dir mtime.\n\n' + touch "$fixtures"/dir1/file2 + rm "$fixtures"/dir1/file2 + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'. RSYNC'* ]] + + printf '\n*** Re-build; all hits.\n\n' + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] + + printf '\n*** Touch file; should miss because file mtime.\n\n' + touch "$fixtures"/file1 + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'. RSYNC'* ]] + + printf '\n*** Re-build; all hits.\n\n' + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] + + printf '\n*** Rename file; should miss because filename.\n\n' + mv "$fixtures"/file1 "$fixtures"/file1a + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'. RSYNC'* ]] + + printf '\n*** Re-build; all hits.\n\n' + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] + + printf '\n*** Rename file back; all hits.\n\n' + stat "$fixtures"/file1a + mv "$fixtures"/file1a "$fixtures"/file1 + stat "$fixtures"/file1 + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] + + printf '\n*** Update content, same length, reset mtime; all hits.\n\n' + cat "$fixtures"/file1 + stat "$fixtures"/file1 + mtime=$(stat -c %y "$fixtures"/file1) + echo world > "$fixtures"/file1 + touch -d "$mtime" "$fixtures"/file1 + cat "$fixtures"/file1 + stat "$fixtures"/file1 + run ch-image build -f ./bucache/rsync.df "$fixtures" + echo "$output" + [[ $status -eq 0 ]] + [[ $output = *'* FROM'* ]] + [[ $output = *'* RSYNC'* ]] +} + + @test "${tag}: all hits, new name" { ch-image build-cache --reset From 6007360c06a11ac1483c9d6f9b8ddf5b095f39ea Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 13 Oct 2023 16:16:42 -0600 Subject: [PATCH 16/20] fix a couple tests [skip ci] --- examples/Dockerfile.almalinux_8ch | 23 +++-------------------- lib/filesystem.py | 6 +++++- test/build/50_rsync.bats | 6 +++--- 3 files changed, 11 insertions(+), 24 deletions(-) diff --git a/examples/Dockerfile.almalinux_8ch b/examples/Dockerfile.almalinux_8ch index c3b3200d1..ba4aa5789 100644 --- a/examples/Dockerfile.almalinux_8ch +++ b/examples/Dockerfile.almalinux_8ch @@ -16,32 +16,15 @@ FROM almalinux:8 # # 4. Issue #1103: Install libarchive to resolve cmake bug # +# FIXME: This instruction seems to be running afoul of #1679. I’ve re-wrapped +# it to have fewer continuations in hopes we trigger that less. RUN dnf install -y --setopt=install_weak_deps=false \ epel-release \ 'dnf-command(config-manager)' \ && dnf config-manager --enable powertools \ && dnf install -y --setopt=install_weak_deps=false \ dnf-plugin-ovl \ - autoconf \ - automake \ - gcc \ - git \ - libarchive \ - libpng-devel \ - make \ - python3 \ - python3-devel \ - python3-lark-parser \ - python3-requests \ - python3-sphinx \ - python3-sphinx_rtd_theme \ - rpm-build \ - rpmlint \ - rsync \ - squashfs-tools \ - squashfuse \ - wget \ - which \ + autoconf automake gcc git libarchive libpng-devel make python3 python3-devel python3-lark-parser python3-requests python3-sphinx python3-sphinx_rtd_theme rpm-build rpmlint rsync squashfs-tools squashfuse wget which \ && dnf clean all # Need wheel to install bundled Lark, and the RPM version doesn’t work. diff --git a/lib/filesystem.py b/lib/filesystem.py index d221c60cc..76eab702f 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -108,7 +108,11 @@ def __rtruediv__(self, left): return NotImplemented def __str__(self): - return super().__str__() + ("/" if self.trailing_slash_p else "") + # Because subclassing pathlib.Path is so bonkers, we often get instances + # that don’t have attribute trailing_slash_p. Don’t crash in this case, + # and cross our fingers we haven’t just lost the trailing slash. + return super().__str__() + ("/" if ( hasattr(self, "trailing_slash_p") + and self.trailing_slash_p) else "") def __truediv__(self, right): return NotImplemented diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index 3ec215818..66df3d933 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -109,7 +109,7 @@ ls_dump () { echo "## created fixtures ##" ls_ "$fixtures" - ls_ "$fixtures" > $BATS_TMPDIR/rsync_fixtures-ls_ + ls_ "$fixtures" > "$BATS_TMPDIR"/rsync_fixtures-ls_ ls_dump "$fixtures" fixtures } @@ -303,7 +303,7 @@ EOF run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" echo "$output" [[ $status -eq 0 ]] - [[ 0 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] + [[ 0 -eq $(echo "$output" | grep -Fc 'skipping non-regular file') ]] ls_dump "$dst" sym-default run ls_ "$dst" echo "$output" @@ -354,7 +354,7 @@ EOF run ch-image build --rebuild -f "$ch_tmpimg_df" "$context" echo "$output" [[ $status -eq 0 ]] - [[ 12 -eq $(echo "$output" | grep -F 'skipping non-regular file' | wc -l) ]] + [[ 12 -eq $(echo "$output" | grep -Fc 'skipping non-regular file') ]] ls_dump "$dst" sym-m run ls_ "$dst" echo "$output" From 526a2333699d94b1a51229728bd13d5804998683 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 13 Oct 2023 16:59:30 -0600 Subject: [PATCH 17/20] fix tests? --- test/build/50_dockerfile.bats | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/build/50_dockerfile.bats b/test/build/50_dockerfile.bats index 23745bd48..2b1c5a0be 100644 --- a/test/build/50_dockerfile.bats +++ b/test/build/50_dockerfile.bats @@ -1128,7 +1128,7 @@ EOF echo "$output" [[ $status -ne 0 ]] if [[ $CH_TEST_BUILDER = ch-image ]]; then - [[ $output = *"error: must specify at least one source"* ]] + [[ $output = *'error: source or destination missing'* ]] else [[ $output = *'COPY requires at least two arguments'* ]] fi From b0b38a26226e4797058c55ad2bb2e0dd573b08a2 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky Date: Fri, 13 Oct 2023 18:04:13 -0600 Subject: [PATCH 18/20] fix tests? --- doc/ch-image.rst | 2 +- test/build/50_rsync.bats | 103 ++++++++++++++++++++------------------- 2 files changed, 53 insertions(+), 52 deletions(-) diff --git a/doc/ch-image.rst b/doc/ch-image.rst index 4f7f8edab..170c07867 100644 --- a/doc/ch-image.rst +++ b/doc/ch-image.rst @@ -1616,7 +1616,7 @@ have changed to the normal files and directories pointed to. The sources themselves, if symlinks, do not get special treatment: -.. code-block:: +.. code-block:: docker RSYNC /sym1/file-sym1_direct /sym1/file-sym1_upover /dst diff --git a/test/build/50_rsync.bats b/test/build/50_rsync.bats index 66df3d933..00b66ffb4 100644 --- a/test/build/50_rsync.bats +++ b/test/build/50_rsync.bats @@ -28,6 +28,7 @@ ls_ () { -e 's# -> $##' \ -e 's#([0-9]+)[f]#\1#' \ -e 's#([0-9]+ )[0-9 ]+[a-z] #\1 #' \ + -e 's#^(d[rwx-]{9}) [0-9]#\1 .#' \ -e "s#$1#/...#" ) } @@ -139,9 +140,9 @@ EOF -rw----r-- 1 12 file-basic1 -rw------- 1 12 file-basic1_nom -rw----r-- 1 12 file-basic1_renamed -drwxrwx--- 1 new +drwxrwx--- . new -rw----r-- 1 12 file-basic1 -drwxrwx--- 1 newB +drwxrwx--- . newB -rw----r-- 1 12 file-basic1 -rw-rw---- 1 12 file-basic2 EOF @@ -173,19 +174,19 @@ EOF echo "$output" [[ $status -eq -0 ]] cat < doesnotexist -drwxrwx--- 1 sym1 -drwxrwx--- 1 dir-sym1 +drwxrwx--- . sym1 +drwxrwx--- . dir-sym1 -rw-rw---- 1 14 dir-sym1.file lrwxrwxrwx 1 dir-sym1_direct -> dir-sym1 lrwxrwxrwx 1 dir-top_rel -> ../dir-top @@ -288,7 +289,7 @@ lrwxrwxrwx 1 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 lrwxrwxrwx 1 file-top_rel -> ../file-top -drwxrwx--- 1 sym2 +drwxrwx--- . sym2 -rw-rw---- 1 10 file-sym2 EOF } @@ -309,8 +310,8 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 lrwxrwxrwx 1 dir-top_rel -> ../dir-top @@ -335,8 +336,8 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 -rw-rw---- 1 10 file-sym1 @@ -360,8 +361,8 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 -drwxrwx--- 1 dir-top_rel +drwxrwx--- . dir-top_rel -rw-rw---- 1 13 dir-top.file -rw-rw---- 1 9 file-out_abs -rw-rw---- 1 9 file-out_rel @@ -416,8 +417,8 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-sym1 lrwxrwxrwx 1 dir-top_rel -> ../dir-top @@ -426,7 +427,7 @@ lrwxrwxrwx 1 file-sym1_direct -> file-sym1 lrwxrwxrwx 1 file-sym1_upover -> ../sym1/file-sym1 lrwxrwxrwx 1 file-sym2_upover -> ../sym2/file-sym2 lrwxrwxrwx 1 file-top_rel -> ../file-top -drwxrwx--- 1 sym2 +drwxrwx--- . sym2 -rw-rw---- 1 10 file-sym2 EOF } @@ -484,7 +485,7 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < doesnotexist EOF } @@ -563,7 +564,7 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-dst EOF @@ -587,8 +588,8 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-dst EOF @@ -612,7 +613,7 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < dir-dst EOF @@ -635,7 +636,7 @@ EOF echo "$output" [[ $status -eq 0 ]] cat < Date: Thu, 19 Oct 2023 10:13:01 -0600 Subject: [PATCH 19/20] respond to feedback --- doc/tutorial.rst | 14 +++++++++----- lib/filesystem.py | 2 +- test/build/55_cache.bats | 6 +++--- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 60d77d162..77a4b77fa 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -265,7 +265,7 @@ recipe: FROM almalinux:8 RUN yum -y install python36 - RSYNC ./hello.py / + COPY ./hello.py / RUN chmod 755 /hello.py These four instructions say: @@ -275,14 +275,18 @@ These four instructions say: 2. :code:`RUN`: Install the :code:`python36` RPM package, which we need for our Hello World program. - 3. :code:`RSYNC`: Copy the file :code:`hello.py` we just made to the root + 3. :code:`COPY`: Copy the file :code:`hello.py` we just made to the root directory of the image. In the source argument, the path is relative to - the *context directory*, which we’ll see more of below. (This instruction - is a Charliecloud extension; see :ref:`its documentation ` - for details.) You can also use standard :code:`COPY` if you prefer.) + the *context directory*, which we’ll see more of below. 4. :code:`RUN`: Make that file executable. +.. note:: + + :code:`COPY` is a standard instruction but has a number of disadvantages in + its corner cases. Charliecloud also has :code:`RSYNC`, which addresses + these; see :ref:`its documentation ` for details. + Let’s build this image:: $ ch-image build -t hello -f Dockerfile . diff --git a/lib/filesystem.py b/lib/filesystem.py index 0ce485922..8a4d0cfc4 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -340,7 +340,7 @@ def is_relative_to(self, *other): def joinpath_posix(self, right): # This method is a hot spot, so the hairiness is due to optimizations. - # It runs about 30% faster than the naïve verson below. + # It runs about 30% faster than the naïve verson referenced below. if (isinstance(right, Path)): right_parts = right._parts ts_p = right.trailing_slash_p diff --git a/test/build/55_cache.bats b/test/build/55_cache.bats index 5f9c04e6c..dd1473e66 100644 --- a/test/build/55_cache.bats +++ b/test/build/55_cache.bats @@ -906,10 +906,10 @@ EOF # Prepare fixtures. These need various manipulating during the test, which # is why they’re built here on the fly. fixtures=${BATS_TMPDIR}/rsync-cache - mkdir -p "$fixtures" + rm -Rf --one-file-system "$fixtures" + mkdir "$fixtures" echo hello > "$fixtures"/file1 - rm -f "$fixtures"/file1a - mkdir -p "$fixtures"/dir1 + mkdir "$fixtures"/dir1 touch "$fixtures"/dir1/file1 "$fixtures"/dir1/file2 printf '\n*** Build; all misses.\n\n' From 2aa8cb311d76939c06278e4b6534b4b427de7981 Mon Sep 17 00:00:00 2001 From: Reid Priedhorsky <1682574+reidpr@users.noreply.github.com> Date: Mon, 23 Oct 2023 20:01:18 -0600 Subject: [PATCH 20/20] typo [skip ci] Co-authored-by: Jordan Ogas --- lib/filesystem.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/filesystem.py b/lib/filesystem.py index f1fb6e925..dc70a67c5 100644 --- a/lib/filesystem.py +++ b/lib/filesystem.py @@ -365,7 +365,7 @@ def is_relative_to(self, *other): def joinpath_posix(self, right): # This method is a hot spot, so the hairiness is due to optimizations. - # It runs about 30% faster than the naïve verson referenced below. + # It runs about 30% faster than the naïve version referenced below. if (isinstance(right, Path)): right_parts = right._parts ts_p = right.trailing_slash_p