Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider deduplicating digests for hardlinked/symlinked outputs #24365

Open
tjgq opened this issue Nov 18, 2024 · 0 comments
Open

Consider deduplicating digests for hardlinked/symlinked outputs #24365

tjgq opened this issue Nov 18, 2024 · 0 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: feature request

Comments

@tjgq
Copy link
Contributor

tjgq commented Nov 18, 2024

Currently, when an action copies inputs to outputs by hardlinking or symlinking them, we recalculate digests for the output files, which can take a significant performance toll on builds that copy lots of files around. This could in theory be avoided if Bazel was able to recognize that it has seen the target of the link before (and hasn't been modified since).

For hardlinks, this could be as simple as using (st_dev, st_ino, st_mtime, st_ctime, st_size) as the DigestUtils cache key (instead of the current (path, st_ino, st_mtime, st_ctime, st_size)). Symlinks would require a bit more work, but are still doable. We'd have to think very carefully about the correctness implications, though.

cc @woody77 @fangism

@tjgq tjgq added team-Performance Issues for Performance teams type: feature request untriaged P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: feature request
Projects
None yet
Development

No branches or pull requests

1 participant