-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic and verifiable operation #60
Comments
FWIW I've spent the better part of 2 hours trying to figure out how to bootstrap the Python packaging tools in a secure, deterministic, and reproducible manner with the Windows embeddable zip distribution (this distribution lacks But looking at the And I am pretty familiar with a lot of low-level Python packaging details. If I can't figure this out, there's little hope for most Python users. |
get-pip.py is not deterministic nor does it validate content hashes when downloading files from the Internet. See pypa/get-pip#60. This makes naive usage inappropriate for PyOxidizer, which wants to ensure downstream consumers can achieve determinism and isn't the weak link in the security chain. Way too much effort was spent developing this commit and figuring out how to get the packaging tools to install securely and deterministically. See the long comment in packaging_tool.rs for details.
The commit referenced above has a detailed comment about how I finally got this working and what didn't work. tl;dr I had to modify |
#73 describes the hack that you need to do this right now. |
get-pip.py
doesn't pin its dependencies nor their SHA-256 hashes. At run-time, it does the equivalent ofpip install --upgrade pip setuptools wheel
, which will pull in the latest versions of these packages (plus dependencies, if any) without hash checking, only relying on TLS x509 certificate checking. So despite claims thatpython get-pip.py
is secure, it is only secure as far as the trusted root CA system is secure. You are still vulnerable to a MitM attack by any server with a valid x509 certificate chaining up to a trusted root CA. And since the common root CA lists contain some CAs associated with governments with... questionable practices, this trust only goes so far. (If I were one of these questionable governments or a nefarious actor, I could MitM PyPI and inject malware into pip, setuptools, wheel becauseget-pip.py
doesn't verify the SHA-256 hashes of files it downloads off the Internet at run time. That would be a very attractive attack target given the sheer volume of machines that would run the poisoned code within minutes and the potential to spread malware by infecting packages built with a poisoned version of pip/setuptools.)If you don't buy into the tin foil hat arguments, another issue with the current approach is it isn't deterministic over time. Even if I download a specific version of
get-pip.py
today and verify its SHA-256 is a trusted value, the results from running it today could be different from running it tomorrow because a new version of pip, setuptools, or wheel is published on PyPI. This lack of reproducibility can be extremely annoying. For example, I try to enforce deterministic and reproducible tests and CI in my projects to the maximal extent possible. If I'm using pip 20 in commit X, I want tests/CI to use pip 20 for all of time. I don't want pip 21 to be silently used when I check out this commit 1 year from now. (An issue related to this is whenever a new version of pip, setuptools, wheel, or get-pip.py are published, random processes that don't pin dependencies can break due to incompatibilities in the new version.)So, I have a feature request for
get-pip.py
: deterministic and verifiable mode.In this mode, executing
get-pip.py
would install a deterministic version of all packages and would specify SHA-256 hashes for all those packages. In this mode,get-pip.py
would be resistant to MitM attacks against the package repository it downloads pip, setuptools, wheel, etc from. It would also (hopefully) guarantee reproducible execution.The way I see this working is
get-pip.py
gains a new CLI flag. Say--reproducible
. In this mode, the invocation ofpip
behind the scenes specifies a requirements/constraints file with pinned SHA-256 hashes and--require-hashes
mode is enabled.Establishing this feature would require updating a pip requirements file/manifest at release time or whenever dependency version is bumped. So it is a bit of extra work for the
get-pip.py
maintainers.I think it is worthwhile to implement this feature in
get-pip.py
itself because bootstrapping packaging tools from a Python distribution that doesn't have them (e.g. the Windows embeddable zip file distributions) in a deterministic and reproducible manner is really difficult. You have to install setuptools, pip, wheel, etc from source, taking care to download and verify deterministic versions of each. I would prefer for Python's packaging tools to offer high levels of security and guarantees of determinism by default.pip
itself can already achieve this with--require-hashes
mode. Butget-pip.py
does not and that undermines the security and integrity of the whole packaging chain.The text was updated successfully, but these errors were encountered: