Consider deduplicating digests for hardlinked/symlinked outputs #24365
Labels
P2
We'll consider working on this in future. (Assignee optional)
team-Performance
Issues for Performance teams
type: feature request
Currently, when an action copies inputs to outputs by hardlinking or symlinking them, we recalculate digests for the output files, which can take a significant performance toll on builds that copy lots of files around. This could in theory be avoided if Bazel was able to recognize that it has seen the target of the link before (and hasn't been modified since).
For hardlinks, this could be as simple as using
(st_dev, st_ino, st_mtime, st_ctime, st_size)
as theDigestUtils
cache key (instead of the current(path, st_ino, st_mtime, st_ctime, st_size)
). Symlinks would require a bit more work, but are still doable. We'd have to think very carefully about the correctness implications, though.cc @woody77 @fangism
The text was updated successfully, but these errors were encountered: