Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checksums to enforce immutability #250

Conversation

avri-schneider
Copy link
Contributor

@avri-schneider avri-schneider commented Oct 31, 2024

This PR enhances the TFLint setup action by allowing multiple expected SHA256 hash values for verifying the downloaded binary. With this update:

  • Multiple Hash Verification: The expected_sha256 input now accepts a comma-separated list of SHA256 hashes. This enables the action to validate the integrity of the TFLint binary against multiple valid hashes, adding flexibility when downloading from different sources or mirrors.
  • Documentation Updates:
    • Updated action.yml to reflect that expected_sha256 supports multiple hashes.
    • Revised README.md to include instructions for providing multiple hashes, with an example in the Usage section.
  • Usage Example: In the README, users can now see how to pass multiple expected hashes to the action.

This improvement ensures that the setup action can handle variations in binary hash values across sources while maintaining integrity checks. Please review the changes and confirm if any additional updates are needed.

I think this should address #16.

avri-schneider and others added 7 commits October 31, 2024 10:50
- Added support for SHA256 hash verification to ensure the integrity of downloaded binaries.
- Introduced `expected_sha256` as an input parameter for users to supply the expected hash.
- Implemented hash computation and comparison; download proceeds only if hashes match.
- Improves security by preventing execution of potentially tampered files.

This enhancement secures the installation process by verifying file integrity.
…tion

- Introduced `expected_sha256` as a new input parameter in `action.yml`.
- Allows users to specify an expected SHA256 hash for the downloaded TFLint binary.
- If `expected_sha256` is provided, the action will compute and compare the hash, proceeding only if there's a match.
- Enhances security by preventing execution of tampered or corrupted files.

This update ensures that downloaded binaries meet integrity requirements before installation.
… feature

- Updated version to 2.1.0 to reflect the new SHA256 verification feature for downloaded binaries.
- Enhanced description and keywords to include SHA256 verification.
- This update improves security by allowing users to verify the integrity of downloaded files with an expected SHA256 hash.

Version bump ensures semantic versioning alignment with the new feature addition.
- Revised `README.md` to describe `expected_sha256` input as accepting a comma-separated list of SHA256 hashes.
- Provided example usage in the `Usage` section showing how to specify multiple expected hash values.
- Clarified that this change allows integrity verification against multiple valid hashes, adding flexibility for binaries from different sources or mirrors.
Copy link
Member

@bendrucker bendrucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cite prior art for supporting multiple hashes for a given version?

README.md Outdated Show resolved Hide resolved
Updated documentation for the expected_sha256 parameter to clarify that multiple SHA256 hashes may be provided to support platform-specific binaries (e.g., macOS, Linux, Windows) for the same TFLint version. This ensures integrity checks are compatible across platforms in matrix workflows.
@avri-schneider
Copy link
Contributor Author

prior art for supporting multiple hashes for a given version

Consider these examples of prior art for handling multiple hashes per version across platforms:

  1. PyPI: Provides platform-specific hashes for each Python package distribution (e.g., .whl files).

  2. Homebrew: Stores unique sha256 hashes per OS and CPU architecture in its formulae.

  3. Docker: Uses manifest lists to assign unique hashes to platform-specific images under a single tag.

- Refactored handling of `expected_sha256` input to avoid errors when input is undefined.
- Ensured `expectedHashes` is set to an array of hashes or `undefined` based on the presence of `expected_sha256`.
- This fix prevents unexpected exceptions during execution if `expected_sha256` is not provided, allowing for optional integrity checks.

This resolves issues with optional SHA256 verification in the `downloadCLI` function.
Copy link
Member

@bendrucker bendrucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some minor comments. The correction from mirrors serving the same binary to multiple binaries for different OS/arch pairs is 👍🏻.

Still, we have two modern approaches available to verification that don't require hardcoding trusted checksums:

It's fine to implement explicit checksums but it would tend to benefit vastly more users to implement automatic provenance verification versus manual checksum verification. I am somewhat hesitant to ship the manual feature first.

Are you up for taking on this first, as a separate PR? Otherwise we can hold onto this and I can try to find time to implement the GH attestation API.

action.yml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/setup-tflint.js Outdated Show resolved Hide resolved
- Replaced `expected_sha256` input with `checksums` for clarity.
- Updated `checksums` input to use newline-separated values instead of commas for better readability.
- Added a dedicated example section for `checksums` usage in the documentation to guide users on its proper implementation.
@avri-schneider
Copy link
Contributor Author

avri-schneider commented Nov 1, 2024

Made some minor comments. The correction from mirrors serving the same binary to multiple binaries for different OS/arch pairs is 👍🏻.

Still, we have two modern approaches available to verification that don't require hardcoding trusted checksums:

It's fine to implement explicit checksums but it would tend to benefit vastly more users to implement automatic provenance verification versus manual checksum verification. I am somewhat hesitant to ship the manual feature first.

Are you up for taking on this first, as a separate PR? Otherwise we can hold onto this and I can try to find time to implement the GH attestation API.

Thank you for the details on "gh attestation" and "cosign verify-blob"—I wasn't aware of these methods and appreciate the insight.

I have a follow-up question about scenarios where companies need strict control over unvetted third-party code in CI. Pinning specific checksums of reviewed binaries provides security by ensuring that only a pre-approved, stable binary is executed for a given version, OS, and architecture.

To achieve the same security level with certificate-based verification, users would still need to fetch and verify these certificates or checksums independently and incorporate them into the workflow. This added layer could complicate workflows, as it would require manual management of certificates within each CI environment—whereas direct checksum verification offers straightforward control and security.

Otherwise, it would make sense to release the explicit checksum feature independently, as it provides immediate benefits that automatic provenance verification would not cover, even if implemented in the future.

Would you suggest any streamlined method for integrating certificate or attestation verification that would still allow the user to maintain strict control over which binaries execute in CI?

@avri-schneider
Copy link
Contributor Author

It's fine to implement explicit checksums but it would tend to benefit vastly more users to implement automatic provenance verification versus manual checksum verification. I am somewhat hesitant to ship the manual feature first.

If automatic provenance verification offers a similar security level, explicit verification might not be necessary for most users. However, if the manual checksum addresses a distinct threat model, releasing it first makes sense, as it’s already implemented and could meet immediate security needs. Since the manual checksum feature is already in my PR, would there be any specific concerns about releasing it first?

remove irrelevant text
@avri-schneider
Copy link
Contributor Author

@bendrucker As a side-note, users of this feature would most likely pin the action to the commit SHA - do you think the README.md should be updated accordingly?:

    - uses: terraform-linters/setup-tflint@ad6bd969c3aad663d980d8fa7d083aad8e7e758d
      name: Setup TFLint with Checksums
      with:
        tflint_version: v0.52.0
        checksums: |
          40f7ee2dbeb8e7cbd5ab7b10912f60eb14aa4fbff62603eeb67fdb5f7cbb794a
          bf758ff29b607b3fbc4a3630ea3b39df4afafe3cdb80c6d71fe528feeac2c58e
          fed6ff15ee10db34a23044ac0d4da8fdc1f2f3663b32ec85d388374dd95670aa

@bendrucker
Copy link
Member

My hesitation stems from the fact that software supply chain security is complicated and most developers have a limited to bare understanding. "Simple" features can do harm by inviting users down the wrong path.

The unambiguously safe posture/tooling for an organization that wants to securely and reliably use third party software at scale is forks/mirrors. You control the source code, the builds, and the hosting of the artifacts.

reviewed binaries provides security

This is the sticking point. You can't review a binary.

If you don't have that control, the checksums only prevent the binary from changing to another binary after you adopt it. In order to trust a binary, you need to be able to prove that it originated from a specific source code revision. That's what attestation is all about.

Take this scenario:

  1. We publish TFLint v0.100.0, with hash abc123, to GitHub Releases
  2. A malicious actor gains access to the repository releases
  3. The actor builds malicious TFLint binaries locally and generates new hashes
  4. They upload the new artifacts, overwriting v0.100.0

Checksums only protect you if you adopt the version before 4 happens. Otherwise, you're liable to adopt a version poisoned with hidden malware which steals source code, credentials, etc. in the background while dutifully recording its checksum.

Signing techniques like attestation mitigate this by detecting step 4, where artifacts are published but were provably not signed in the trusted build environment. An attacker now needs to upload their malicious revision, tag it, and re-run the release workflow. I believe you can delete Action workflow run logs as a repository administrator but I don't think it's possible to scrub the metadata about the job run, which should immediately point to a compromise once examined.

Checksums are helpful but can't sufficiently secure binaries on their own. Take Terraform (Registry) for example. Providers ship as binaries. Terraform's lockfile prevents those binaries from changing on future downloads. But the Registry also requires provider authors to sign every binary (with a GPG key) when publishing to GitHub Releases. So you can at least be sure that the provider was built by someone with that GPG private key, and not just access to the author's Terraform Registry account. Same ideas but just not implemented as nicely because of the dependence on passing secrets to CI (a private key).

All that said, checksums have their place, they just suffer from especially poor usability in this context because we can't rely on a CLI to write them automatically.

users of this feature would most likely pin the action to the commit SHA

Probably right, but they should resolve the SHA for the latest version at install time. They can use https://github.com/sethvargo/ratchet or any of the other tools that handle auto-pinning in Actions workflows.

I'll find time to land this sometime soon with documentation revisions. Mostly want to emphasize in the docs that checksums guarantee immutability only, not integrity, trust, attribution, etc.

@avri-schneider
Copy link
Contributor Author

The unambiguously safe posture/tooling for an organization that wants to securely and reliably use third-party software at scale is forks...

(Side note:) While GitHub forks allow organizations to control the source code and artifact hosting, they carry a risk. Forks of public repositories cannot be made private, so any secret inadvertently committed would be publicly exposed and difficult to remediate. This makes forks a double-edged sword for organizations aiming to avoid accidental exposure while managing third-party software.

This is the sticking point. You can't review a binary.

Organizations would indeed review the source code, ensure the binary was generated from that reviewed source, and then pin the binary’s checksum. For those that may not fully trust the upstream developer, the developer’s signing of binaries only partially mitigates supply-chain risks.

The safest approach remains for organizations to compile from reviewed source and store binaries in a private artifact repository. Still, some organizations might consider the likelihood of a malicious binary being undetected in the public repository to be a lower risk than allowing upstream code to run arbitrary builds in their CI environment.

For such organizations, a practical route could be to leverage provenance verification methods as you suggested, confirming that reviewed source code hashes match binaries signed by the developer, and then pinning those validated hashes. Since the process of pinning is inherently manual, there may be no real need for additional implementation. Instead, a note in the documentation advising users to verify hashes before pinning—using the methods you recommended—could suffice. Although this workflow requires manual verification, it balances security with practical CI use, as organizations can pin hashes after reviewing and approving the source and performing local provenance verification.

- Updated the README to include a note advising users to verify binary hashes independently before pinning.
  - Suggested using methods such as GitHub’s attestation or cosign for hash verification to ensure that only approved binaries are used in workflows.
  - Emphasized that the `checksums` input requires manual pinning and that users should validate hashes as an added security measure.
- Updated the `checksums` section to clarify that checksums guarantee only the immutability of the downloaded binary by verifying it has not been altered, without addressing integrity, trust, attribution, or authenticity.
- Specified that multiple hashes enable cross-platform compatibility for workflows running on different platforms, architectures, or operating systems, where each may produce a unique hash.
- Added guidance for users to verify the binary's source and authenticity independently using methods like GitHub’s Artifact Attestations or cosign before adding hashes to workflows.
@avri-schneider
Copy link
Contributor Author

Probably right, but they should resolve the SHA for the latest version at install time. They can use https://github.com/sethvargo/ratchet or any of the other tools that handle auto-pinning in Actions workflows.

Pinning should follow a thorough code review of the reusable action to ensure no unpinned binaries, untrusted external resources, or vulnerabilities to injection attacks are present. During my review of your action, I noticed it was pulling binaries from GitHub, which required hardening. This led to my PR contribution to address these security considerations.

For organizations, maintaining an allow-list of pre-approved actions, each pinned to a specific commit SHA, enables developers to use only those actions vetted by the security team.

GitHub’s security model has a limitation where it resolves hashes to commits in forks, even if those commits haven’t been merged into the upstream repository. This allows a developer to potentially bypass security controls by creating a malicious commit in a fork. For instance, a SHA pin like terraform-linters/setup-tflint@b246abd5a435d1528cd70b4cda3542488db8f807 would technically work, even though my PR hasn’t been merged yet.

This limitation matters for organizations that might want to allow actions from a trusted organization using wildcard patterns, as it introduces a risk that unmerged, potentially harmful commits in forks could be resolved. For robust security, each action should be explicitly pinned and reviewed to avoid this potential bypass.

@bendrucker
Copy link
Member

bendrucker commented Nov 6, 2024

ensure the binary was generated from that reviewed source

My general hangup here is that I'm convinced virtually no one does this. Instead they copy a checksum that they never verify.

It's sort of like unverified SSL in that encryption is better than nothing but it's still not fully secure.

Still, some organizations might consider the likelihood of a malicious binary being undetected in the public repository to be a lower risk than allowing upstream code to run arbitrary builds in their CI environment.

This is the other half, it shouldn't be arbitrary, because you reviewed all the code. But likewise, I don't believe that most people are actually reviewing the content before pinning its hash. It feels secure (hashes!) but avoids any "hard" things like... reading code 🫢.

In any case, these are just my personal biases/priors on checksums, and why I'm a little hesitant to "push" users towards doing this unless they're sure they need it. Thanks for bearing with me! Made some small copy edits and ended up removing the example as it took up too much space for an "advanced" feature. Otherwise LGTM, will fix the conflicts and merge this momentarily.

@bendrucker
Copy link
Member

Separately I'll try to make some time to work on automatic attestation. Attestation will return the exact commit SHA when verifying a checksum file. Users could then specify a commit SHA for verification. The action would download the specified version and verify it was built from that commit.

You can prove that the binaries came from the right repository, with a matching tag to the version, and the exact content that tag pointed to at build time. The only way the actual binary could ever change would be to re-run a previous release job. And even then, the workflow cannot change, so some upstream, unpinned dependency would have to be compromised before the re-run could compromise a build.

@avri-schneider
Copy link
Contributor Author

The only way the actual binary could ever change would be to re-run a previous release job. And even then, the workflow cannot change, so some upstream, unpinned dependency would have to be compromised before the re-run could compromise a build.

What about self-hosted runners?

Thanks for bearing with me!

Likewise! :)

@bendrucker
Copy link
Member

What about self-hosted runners?

Yes, the runner environment could be compromised. Since the workflow file must specify self-hosted runner labels in runs-on, you know for a given SHA whether the workflow would run on a GitHub runner or a self-hosted one.

@avri-schneider
Copy link
Contributor Author

That is a very strong and undocumented constraint.

@avri-schneider
Copy link
Contributor Author

Also, what prevents a maintainer from creating a self hosted runner labeled "ubuntu-latest"?

@bendrucker
Copy link
Member

It's a bit hard to tell from the docs but it seems like self-hosted is a required label when passing a list of labels to runs-on. But when using runner groups, you don't need labels:

https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/managing-access-to-self-hosted-runners-using-groups

But if it's indeed possible to label your self-hosted as ubuntu-latest and magically start taking jobs away from the hosted runners, that's bad behavior and definitely something to report to GitHub.

@bendrucker bendrucker merged commit 8093687 into terraform-linters:master Nov 7, 2024
11 checks passed
@bendrucker bendrucker changed the title Add support for multiple SHA256 hash values for TFLint download verification Add checksums to enforce immutability Nov 7, 2024
@avri-schneider
Copy link
Contributor Author

It's a bit hard to tell from the docs but it seems like self-hosted is a required label when passing a list of labels to runs-on. But when using runner groups, you don't need labels:

https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/managing-access-to-self-hosted-runners-using-groups

But if it's indeed possible to label your self-hosted as ubuntu-latest and magically start taking jobs away from the hosted runners, that's bad behavior and definitely something to report to GitHub.

tested it here - it executed the job on my self-hosted runner, which I labeled ubuntu-latest. This is a re-run of a previously successful workflow job that was executed on a github hosted runner

@bendrucker
Copy link
Member

Interesting. Sloppy on GH's part but impossible to undo at this point given that someone could be depending on it.

I took a closer look at the attestation response and it looks you just need to assert against the runner environment if you want to enforce it.

gh attestation verify ~/Downloads/checksums.txt --repo terraform-linters/tflint --format json --jq '.[].verificationResult.signature.certificate.runnerEnvironment'
Loaded digest sha256:c22862d98642e3894687d0732632fe301e8053f581fb43405ceddd533a74f605 for file:///Users/ben/Downloads/checksums.txt
Loaded 1 attestation from GitHub API
✓ Verification succeeded!

github-hosted

@avri-schneider
Copy link
Contributor Author

Interesting indeed, but for self-hosted runner, isn't the value of verificationResult.signature.certificate.runnerEnvironment controlled by the admin of the self-hosted runner?

@bendrucker
Copy link
Member

Shouldn't be any more falsifiable than the SHA or any other attested attribute. I don't know if/where GitHub validates what was signed.

@avri-schneider
Copy link
Contributor Author

I suspect this is a property that the runner provides to GitHub and they blindly sign it.

@avri-schneider
Copy link
Contributor Author

Probably done here - this, as far as I can tell runs on the self-hosted runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants