Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pinned builds with reproducible pinned builds #1686

Merged
merged 6 commits into from
Oct 4, 2023
Merged

Conversation

spoonincode
Copy link
Member

This is a 2-in-1 PR that migrates existing pinned builds to be reproducible and also shifts the binaries produced from the pinned build to be usable on any modern Linux distro (we no longer need to provide a separate Ubuntu 20 binary, Ubuntu 22 binary, etc etc). Additionally, as a bonus, the pinned builds are upgraded to latest Clang 17.

Reproducible Builds

There are a number of ways and tools to accomplish a reproducible build, this approach opts to use Docker as a means of standardizing the build environment because,

  • The team is familiar with this over any other options; Docker has been used for EOSIO/Leap for many years as part of its CI
  • Even for those unfamiliar with Docker, for the most part a Dockerfile is very clear in the actions it is taking; it is like reading a script -- it's very transparent
  • It integrates well in to existing CI platform caching strategies

The most problematic part of using Docker -- the part I expect to be a little continuous -- is that building the pinned build now requires Docker: it's not just running a script. It's an additional tool that must be installed (and the daemon started) before a user can do the build. And while, for the most part, a user does not need to be familiar with Docker to perform the build (it's just a single command as seen in the README), there can be some nuances that sneak up over time such as Docker's build cache. My hope has been we could de-emphasize pinned builds prior to this change but that remains uncertain; still, I have removed the encouragement in the README for now.

That said, using Docker also has a number of quality-of-life improvements over the existing scripts. For example, we don't have to worry about what previous invocations have left the state of the environment in (was the boost patched or not? etc)

It's worth mentioning that the build has been modified to build Clang from source instead of using a downloaded binary. This substantially increases the time to perform a pinned build (though, another Docker quality-of-life shows up here via its caching!), but this modification was done because of some uncertainty around the pedigree of Clang's binary builds: recent versions are not signed.

This approach may shift the scope of what users must trust for a pinned build in other ways too though. Building this reproducible pinned binary now requires trust in Debian's packages that form the root of the build environment.

See inline comments for discussion on some specific compromises/hacks that had to be done.

At the current time, only the .deb is reproducible. The .tar.gz is not due to CPack feeding files into the tarball ordered based on their inode number. I'd like to solve this somehow (possibly with a post-processing step, if it comes to it; but it'd be awesome to submit something back to Cmake here) since it really helps out users in running the builds on platforms other then Debian/Ubuntu. Worst case fixing this can potentially wait until the future.

Before 5.0.0 the intention is to establish a code signing regime for binaries the team posts as release assets.

Binary Compatibility

Historically we've had to generate binaries for each version of Ubuntu we supported. This was required because we had runtime dependencies (i.e. shared library linkage) to libraries that have different ABI on each version of Ubuntu. Such differences also made it difficult to provide binaries for other distros: if we were to provide RHEL8 binaries do we also need RHEL9? Or would they just work? Sometimes the release cadence of Ubuntu may not match up well to Leap and there could be a period of time where while Leap compiles perfectly, existing binaries don't work due to this ABI breakage.

It sure would be nice to just have a single build to worry about; a build that is forward compatible.

One solution to this problem is to static link the entire executable. That's not really viable imo -- static linking glibc can have weird side effects with NIS stuff, and while static linking to musl might be more supported, that would be quite the shift for our pinned builds. Plus, full static linking isn't really a supported configuration on macOS or Windows afaik so probably best to not go down that rabbit hole.

The good news is we've made progress with our build over time and we're now at the point we no longer have dependencies to shared libraries that don't maintain stable ABIs. So as long as we don't depend on symbols from our shared library dependencies past a chosen target, the binaries ought to operate well in to the future from that chosen target. So we need to build against the version of libraries at some chosen target point in time. There tends to be two ways to accomplish that: build your own complete sysroot, or pick a distro whose package versions become the implicit target point in time. While a custom sysroot allows dialing in target versions precisely, it is significantly more complex, so this PR opts to chose Debian 10 as its baseline target (but see some additional inline commentary)

Notably, this means glibc 2.28+ is required to run the produced binaries. This includes Ubuntu 20.04+ and RHEL8+, but unfortunately technically not Ubuntu 18.04 which is at 2.27 (fwiw, the binaries do seem to run on Ubuntu 18.04 as maybe there is either no material difference in the API between 2.27/2.28, or Leap is not making use of such differences). Other dependencies like libcurl, libz, and GMP are similarly compatible. And, of course, the pinned builds bake in their own static linked libc++, so we do not use the old libstdc++ in Debian 10.

Todo

  • Integrate in CI (this causes quite the paradigm shift, as we'll have builds which are tested on a platform other then that they're built on)
  • Performance test relative to older Clang 10 pinned build

##deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/$(date -d @${SOURCE_DATE_EPOCH} +%Y%m%dT%H%M%SZ)/ buster main
##deb [check-valid-until=no] http://snapshot.debian.org/archive/debian-security/$(date -d @${SOURCE_DATE_EPOCH} +%Y%m%dT%H%M%SZ)/ buster/updates main
##EOS
##EOF
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debian's package snapshot repo is a great tool in establishing a very reproducible base environment at a point in time. One can come along a year from now and easily get the precise same packages ergo precise same environment.

Unfortunately it comes at a cost. This package repo is quite slow: it can take nearly an hour to make it through the update/upgrade/install below. I have commented out this feature for now but I am unsure how much it will affect the reproducibility mid/long term. For example, is it possible a small difference in libz shows up one day that slightly alters the resulting leap build compared to before the libz upgrade?

And if we're not going to make use of the snapshot repo, why pick Debian 10 over just, say, Ubuntu 18.04?

# libc6 (>= 2.27), libc6 (>> 2.28), libc6 (<< 2.29), libcurl4 (>= 7.16.2), libgcc1 (>= 1:3.3), libgmp10, zlib1g (>= 1:1.2.0)
# and the included sed rule within this script will reduce it to
# libc6 (>= 2.27), libcurl4 (>= 7.16.2), libgcc1 (>= 1:3.3), libgmp10, zlib1g (>= 1:1.2.0)
# This may need to be tweaked in the future further; clearly not ideal.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very disappointed with the workarounds this script has to perform. Hopefully the comments make it clear what is going on. It'd be interesting to look more in to Installed-Size and possibly send a tweak upstream to CMake (maybe just use a constant size for size of directories). I'm not immediately sure how to improve the libc6 version issue, besides disabling automatic dependency generation

src/tools/tweak-deb.sh build/leap_*.deb

FROM scratch AS exporter
COPY --from=build /build/*.deb /build/*.tar.* /
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deficiency here is that users cannot (easily) run unit tests themselves. Possibly a wiki page or other such documentation should be added with the additional complex steps to run the tests. (you would need to build an image with the build target, and then run a container with it, and then run ctest inside it)

@spoonincode spoonincode marked this pull request as draft September 27, 2023 15:48
@spoonincode spoonincode linked an issue Sep 27, 2023 that may be closed by this pull request
@greg7mdp greg7mdp self-requested a review September 28, 2023 17:07
Base automatically changed from boringssl to main September 28, 2023 21:18
@greg7mdp
Copy link
Contributor

I wonder how many developers among us use libc++ for their daily builds? I wonder if we should have a policy of using the same libc++ that we recommend for the pinned builds.

@spoonincode
Copy link
Member Author

I wonder how many developers among us use libc++ for their daily builds? I wonder if we should have a policy of using the same libc++ that we recommend for the pinned builds.

I wanted the pinned compiler to be easily extractable so dev team could easily use it (for troubleshooting errors/warnings/bugs/whatever). I thought this was going to be a simple 2 line addition,

FROM scratch as export-toolchain
COPY --from=builder /pinnedtoolchain/ /

but there are some quirks that I need to slightly tweak first. I think we can have it fairly easily though.

@spoonincode
Copy link
Member Author

I wonder how many developers among us use libc++ for their daily builds? I wonder if we should have a policy of using the same libc++ that we recommend for the pinned builds.

I wanted the pinned compiler to be easily extractable so dev team could easily use it (for troubleshooting errors/warnings/bugs/whatever). I thought this was going to be a simple 2 line addition,

FROM scratch as export-toolchain
COPY --from=builder /pinnedtoolchain/ /

but there are some quirks that I need to slightly tweak first. I think we can have it fairly easily though.

I attempted this in,
repro...repro_extractable_wip
unfortunately while it worked great on Arch, on Ubuntu setting a CMAKE_TOOLCHAIN_FILE seems to be preventing cmake from finding libraries like GMP or libz even. I'll add #1708 to track this.

@spoonincode
Copy link
Member Author

I'm going to move this to reviewable and treat the CI changes as a separate PR on top of this.

@spoonincode spoonincode marked this pull request as ready for review October 3, 2023 19:59
#### Pinned Build
Make sure you are in the root of the `leap` repo, then run the `install_depts.sh` script to install dependencies:
#### Pinned Reproducible Build
The pinned reproducible build requires Docker. Make sure you are in the root of the `leap` repo and then run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere do you want to mention any docker on X86 architecture would work for production builds (JIT and OC)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually.. that's curious.. what happens if you try to run this on ARM or such 🤔 There is actually nothing x86 specific with the possible exception that the debian root,

FROM debian@sha256:d774a984460a74973e6ce4d1f87ab90f2818e41fcdd4802bcbdc4e0b67f9dadf AS builder

being called out via hash might limit this to x86. That might be a compelling reason to use a tag name instead such as debian:buster-20230919, though using a tag name is more at risk of a malicious switch-a-roo if debian's docker hub is compromised in some way in the future.

But, existing pinned builds did not support anything but x86, so I'm not sure we need to change any documentation around this limitation strictly due to this PR (maybe some other misc README clean up..)

@@ -93,32 +93,23 @@ git submodule update --init --recursive
Select build instructions below for a [pinned build](#pinned-build) (preferred) or an [unpinned build](#unpinned-build).

> ℹ️ **Pinned vs. Unpinned Build** ℹ️
We have two types of builds for Leap: "pinned" and "unpinned." The only difference is that pinned builds use specific versions for some dependencies hand-picked by the Leap engineers - they are "pinned" to those versions. In contrast, unpinned builds use the default dependency versions available on the build system at the time. We recommend performing a "pinned" build to ensure the compiler remains the same between builds of different Leap versions. Leap requires these versions to remain the same, otherwise its state might need to be recovered from a portable snapshot or the chain needs to be replayed.
We have two types of builds for Leap: "pinned" and "unpinned." A pinned build is a reproducible build with the build environment and dependency versions fixed by the development team. In contrast, unpinned builds use the dependency versions provided by the build platform. Unpinned builds tend to be quicker because the pinned build environment must be built from scratch. Pinned builds, in addition to being reproducible, ensure the compiler remains the same between builds of different Leap major versions. Leap requires the compiler version to remain the same, otherwise its state might need to be recovered from a portable snapshot or the chain needs to be replayed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain reproducible in more details? Was the build by old pinned build script not reproducible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, previous build was not reproducible due to, for example, different paths used during the build (some of these paths get baked in to the final executable), and different time stamps used inside of the .deb package. Fixing these doesn't strictly require building inside of a container (for example maybe we could use ffile-prefix-map to fix up the paths in different installations). But doing it in a container makes it easier.

(there are other issues too, like the fixup tweak-deb.sh does on Installed-Size. and maybe some other nuances I'm not recalling quickly off hand)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation.

Copy link
Member

@linh2931 linh2931 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out the branch and did a test of the build. It worked. Did a quick run of nodeos. It was good.

The only thing I'd like to mention is it took 69 minutes to finish the build, on Ubuntu 20.04, 64 GB, Intel I9 16 cores.

@spoonincode
Copy link
Member Author

The only thing I'd like to mention is it took 69 minutes to finish the build, on Ubuntu 20.04, 64 GB, Intel I9 16 cores.

Yeah, since we're building the compiler now that first build will take a lot of time. As the PR description mentioned,

this modification was done because of some uncertainty around the pedigree of Clang's binary builds: recent versions are not signed

The good thing is that this is cached both on your box via Docker, and on CI. So next time you do a build -- even in a completely different checkout -- the entire builder stage is restored via cache.

@spoonincode spoonincode merged commit 7ba7d14 into main Oct 4, 2023
21 checks passed
@spoonincode spoonincode deleted the repro branch October 4, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reproducible binary builds
3 participants