-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide recommendation to counter xz utils style attack #560
base: main
Are you sure you want to change the base?
Conversation
The malicious attack on the xz utils slipped through many defenses because the "source" package included pre-generated malicious code. This meant that review of the source code (e.g., as seen by git) couldn't find the problem. This proposes a best practices to counter it. The text is longer than I'd like, but it's hard to make it short, and this was a worrying attack so I think it's reasonable to say this. We'll probably need to renumber this proposal if we also add the proposed text to counter attacks like polyfill.io: #559 ... but I think that's okay! Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
+1 |
Thinking about this a bit more. Neither published artifacts nor source repositories are completely safe from supply chain attacks, but are we confident that the surface area for risk is absolutely less with source repos? And for all ecosystems and package registries? If the xz attack had been via the source rather than the artifact would we be having a different conversation? One potential protection that (at least some) package registries give you is that usually once an artifact is published, it cannot be tampered with. For example, if I But if I Notably, if the project was compromised, a rogue maintainer could poison the source repository and in this example, version Therefore, there are scenarios in which relying on the published package is safer. There is also something to be said for non-malicious situations, where maintainers are correcting tags or simply making mistakes. For example, I have deleted erroneous tags in projects before. As an example, some projects use a tag format like Clearly there is not a perfect answer. The question is, do we have a recommendation that we can confidently say is the better option? I'm not convinced at this point that there is, and I there is some nuance depending on which ecosystems and registries we are talking about. |
Indeed in any language with a package repository, git is for development, not consumption. I continue to think the only true solution here is to use the git repo, replicate the build process, and measure how close the result is to the published, immutable artifact - post publish. |
flowchart
0[Install tarball from registry]
1[Rejoice]
a{
Concerned
about
compromised
tarballs?
}
b(Build from source)
c{
Concerned
about
compromised
source?
}
0 --> a
a --> |YES| b
b --> c --> |YES| 0
a --> |NO| 1
c --> |NO| 1
|
Indeed, because then you get to rely on the immutable tarball, and you get to verify its contents based on the repo independently without any unpaid work from maintainers in a way that anyone can audit and replicate, including for historical packages whose maintainers are no longer around. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using this suggestion, if I want version 1.2.3
of a package, I have to:
- determine if a
1.2.3
branch or tag exists in the source repo, and get the name of that branch/tag (it may or may not be1.2.3
exactly) - if the branch/tag exists, I must trust that it matches the released artifact's substantive content
- if I can't trust that it matches, then what do I do?
- if I can't find a matching branch/tag, or don't trust it, or don't want to use it, then what do I do?
There is also no guarantee that the published artifact at version 1.2.3
matches what the repo says via branch or tag is 1.2.3
. The published artifact is not associated with a commit id or anything (and even if it were, that commit id may not exist anymore).
If source repos are clean and well-maintained, then maybe this works. But they are, quite often, far from being clean and well-maintained. Often tags and branches are missing entirely, or they stopped pushing tags seven years ago, or even completely different versions in repo tags vs the package registry. e.g. registry has only 1.0,1.2,1.4, and the repo has only 1.1,1.3. I have actually seen this in production packages with many downloads.
And this gets even murkier when we get into vulnerability management. If I can't properly associate what I'm using with an exact version, then I can't accurately know what CVEs I am impacted by.
There are also other downsides to consuming the source repo directly that are orthogonal to supply chain attack risk.
This recommendation seems to boil down to "don't use package registries/artifactories; don't use published artifacts; use only source code repositories". I think it relies too much on assumptions that we cannot make about source repos. The source repo is the canonical truth, but the published artifact is what is meant to be consumed. For these reasons, and the fact that using published artifacts actually offers some protection that the source repo cannot offer, I find it difficult to recommend this as currently written.
The xz attack inserted malicious source into the source package (the tarball of source code) that was not in the repository. If the attack had been visible in the repository it would have been immediately visible to all others who checked the repository. The history would be immediately visible, since git (and any other vcs) can track changes. But since it was hidden in a tarball, it wasn't obvious... it's historically expected behavior that generated files in the tarball aren't in the VCS, and since they are often different when regenerated somewhere else, there's no easy way to check them. I'll try to rewrite this to make it clearer. |
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
No, not at all. The word "package" has multiple meanings. Source packages aren't built packages. A package registry, like npm, usually only deals with built packages. However, in the system package world, there's often a distinction between a source package (a package of source code) and a built package. The built packages are built from the source packages, because there are often many builders. However, it's been conventional to take what's in the VCS and add stuff into the source package... which led to the xz utils attack. If the source package is only a copy of the VCS, then it's easy to verify that they're the same. If there are mysterious additions, it's impractical to notice malicious additions. |
Signed-off-by: David A. Wheeler <[email protected]>
Signed-off-by: David A. Wheeler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly I'd like to say a recommendation along these lines would be nice so +1 🙂
I spotted a merge conflict header was still included.
Also I wrote some suggestions and thoughts on the topic. Apologies for it being quite wordy. I hope it can trigger some interesting discussion 🙂
RIP: it wasn't posted 😢 I'll try to re-write it
Co-authored-by: j-k <[email protected]> Signed-off-by: David A. Wheeler <[email protected]>
I believe all the issues listed above have been addressed. By starting with "If a source code (unbuilt) package is released...", it makes it clear that we are not talking about built packages. The "if" also makes it clear that there might not be such a thing, which makes sense, because not every ecosystem has this distinction. The merge conflict has also been fixed. Ready to merge? |
@@ -33,6 +33,8 @@ Here is a concise guide for all software developers for secure software developm | |||
24. **Continuously improve**. Improve scores, look for tips, & apply as appropriate. | |||
25. **Manage succession**. Have clear governance & work to add active, trustworthy maintainer(s). | |||
26. **Prefer memory-safe languages**. Many vulnerabilities involve memory safety. Where practical, use memory-safe programming languages (most are) and keep memory safety enabled. Otherwise, use mechanisms like extra tools and peer review to reduce risk. | |||
27. **Ensure production websites only load assets from your own domains**. _Linking_ to other domains is fine, but where practical, don't directly load assets such as JavaScript, CSS, and media (including images) from domains you do not control. If you do, your site might be subverted if that other domain is subverted, so investigate the risks before doing so. See the [subverted polyfill.io revelation in 2024](https://blog.qualys.com/vulnerabilities-threat-research/2024/06/28/polyfill-io-supply-chain-attack). | |||
27. **If a source code (unbuilt) package is released, it should only include version-controlled source, and users should rebuild its contents to create production (built) package(s)**. E.g., if autotools is used, if a source package is released it should _not_ include a generated `configure` file, while recipients should ignore pre-generated files and instead rebuild from source (e.g., with `autoreconf`). This eliminates a malware-hiding mechanism, as illustrated by an attack on [xz utils](https://access.redhat.com/security/cve/CVE-2024-3094). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is aimed at a developer trying to be secure but this recommendation is almost positioned as if the dev was intending on doing an xz utils style attack.
With a bit of tweaking it could come across more as a recommendation to avoid actions which can be suspicious/malicious.
Also I feel like there are two separate aspects here to fully address xz utils style attacks.
- Files not in version-control included in "source release bundles"
- This addresses the uncommitted macro part.
- Auto-generated or non-source files committed to the repository (part 1 would catch the same files uncommitted)
- This addresses the committed obfuscated code "test data" part.
The first recommendation can clearly explain: When providing a source code (unbuilt) package, a source code bundle, whatever you wish to call it, you should only include that which is committed. If you pre-generate content which is not committed to the original source that can look like an xz utils style attack rather than anything helpful.
It could also be raised that omitting source files which have been committed could also look suspicious or cause builds to behave differently, potentially something a malicious actor could leverage.
With two it's a bit more nuanced.
I think with the example of autoreconf
it's quite clear cut, don't commit them: these scripts can be generated deterministically and there's also the expectation you're going to need those files you're doing some kind of build process anyway so there's no excuse not to also execute configure
to build those files.
But with something like a package-lock.json this is a generated file yes but it holds important information. It's not super human readable but it can be audited, etc, so this is something you can commit.
If it's super well known for the toolchain and environment it's probably not worth writing an explanation why this generated file was committed.
With something like code generated from a proto definition you could argue it's not source code and should be generated from scratch on-demand. But also you might want the code generated and committed to use as a library in the case of golang libraries. In this case you could maybe look over the code, generate the code from multiple machines, or intermittently regenerate the code to check for anything weird.
There are arguments for and against but if there was a clear identifier these files are pre-generated, explanation of the steps you've taken to ensure those files are good, and/or explain how to generate those yourself I think that's fine.
Then there is committed assets like tar.gz/zip or at worse executable binaries. In the vast majority of cases I'd say to not commit any binaries to version control and maybe that's a recommendation we can make. There will be very rare exceptions to this like the tiny hex0-seed
or maybe some larger binaries in the cases of more rudimentary bootstrapping but these rare exceptions should require exceptional reasons and appropriate documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files not in version-control included in "source release bundles"
This addresses the uncommitted macro part.
Right. Maybe we should emphasize that the source package (what you're calling bundles) should only include what's in the VCS. It's okay to remove some things (that often happens with vendored stuff), those are easy to identify, it's the subtle additions that can be sneaky.
Auto-generated or non-source files committed to the repository (part 1 would catch the same files uncommitted)
This addresses the committed obfuscated code "test data" part.
I think that's too nuanced to practically address. I think we shouldn't try to do that in this point. It's not practical to say things like "you can't have images in a VCS" - especially if your software must process images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a tweak to the text to hopefully clarify it's focused on point 1, and doesn't say anything about point 2 (from your list of points above).
Is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scorecard already forbids inclusion of (executable) binaries in a VCS. We could add that too, but I suspect that's better addressed as a separate point in a separate PR. By itself that wouldn't have countered an xz utils style attack. Requiring only readable text in a VCS would have made the attack harder, but that's impractical. Let's discuss this topic in a different PR later.
The source package should be a copy or subset of the VCS materials. Signed-off-by: David A. Wheeler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 😄
@@ -33,6 +33,7 @@ Here is a concise guide for all software developers for secure software developm | |||
24. **Continuously improve**. Improve scores, look for tips, & apply as appropriate. | |||
25. **Manage succession**. Have clear governance & work to add active, trustworthy maintainer(s). | |||
26. **Prefer memory-safe languages**. Many vulnerabilities involve memory safety. Where practical, use memory-safe programming languages (most are) and keep memory safety enabled. Otherwise, use mechanisms like extra tools and peer review to reduce risk. | |||
27. **Ensure production websites only load assets from your own domains**. _Linking_ to other domains is fine, but where practical, don't directly load assets such as JavaScript, CSS, and media (including images) from domains you do not control. If you do, your site might be subverted if that other domain is subverted, so investigate the risks before doing so. See the [subverted polyfill.io revelation in 2024](https://blog.qualys.com/vulnerabilities-threat-research/2024/06/28/polyfill-io-supply-chain-attack). | |||
27. **If a source code (unbuilt) package is released, it should only include content from the version control system (VCS), and source package users should rebuild to create production (built) package(s)**. E.g., if autotools is used, if a source package is released it should _not_ include a generated `configure` file, while recipients should ignore pre-generated files like `configure` and instead rebuild from source (e.g., with `autoreconf`). This eliminates a malware-hiding mechanism, as illustrated by an attack on [xz utils](https://access.redhat.com/security/cve/CVE-2024-3094). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unbuilt packages are also production packages, for scripting languages - is there another way to phrase this?
27. **If a source code (unbuilt) package is released, it should only include content from the version control system (VCS), and source package users should rebuild to create production (built) package(s)**. E.g., if autotools is used, if a source package is released it should _not_ include a generated `configure` file, while recipients should ignore pre-generated files like `configure` and instead rebuild from source (e.g., with `autoreconf`). This eliminates a malware-hiding mechanism, as illustrated by an attack on [xz utils](https://access.redhat.com/security/cve/CVE-2024-3094). | |
27. **If a source code (unbuilt) package is released, it should only include content from the version control system (VCS), and source package users should rebuild, if needed, to create production (built) package(s)**. E.g., if autotools is used, if a source package is released it should _not_ include a generated `configure` file, while recipients should ignore pre-generated files like `configure` and instead rebuild from source (e.g., with `autoreconf`). This eliminates a malware-hiding mechanism, as illustrated by an attack on [xz utils](https://access.redhat.com/security/cve/CVE-2024-3094). |
The malicious attack on the xz utils slipped through many defenses because the "source" package included pre-generated malicious code. This meant that review of the source code (e.g., as seen by git) couldn't find the problem.
This proposes a best practices to counter it. The text is longer than I'd like, but it's hard to make it short, and this was a worrying attack so I think it's reasonable to say this.
We'll probably need to renumber this proposal if we also add the proposed text to counter attacks like polyfill.io: #559 ... but I think that's okay!