-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNG-8258] activate Reproducible Builds by default #1726
Conversation
@hboutemy I think you need to update the following ones instead: |
wow, we have so many root poms? I'm lost in all these copies |
The whole model builder using the v3 api is kept to not break some plugins too much, but it's not the one used by default. We should mark it as deprecated clearly... |
Alternatively, if the build time is considered a problem, why not just excluding it completely? It is not part of the JAR file specification as far as I can see (I don't see it in the list of attribute names). If we fix a value in a |
Do I understand correctly that then maven defines a default timestamp for the build?!? This looks quite odd to be honest. Should the default not be something like... well
I think one should give an example on how to do that then? |
I agree that it would be better to not include that information if it's not provided. Is there any easy way to do that ? |
I agree too here if we could have another solution. |
So this will be then only explicit opt-out? |
There is a hidden config property |
if anybody knows how to do a zip that does not contain any timestamp, I'm all ears open.
no, you have 2 options:
if you want, we can use this: impact is that to rebuild the exact same jar, we'll need to download the reference binary, extract the value used by the release manager, then inject to the rebuild recipe if you prefer, we can put 1/1/1970, or any other conventional value that you prefer and looks "more common" |
The |
that's not the result I get:
or if we don't trust
|
I thought that we were talking about the content of the
The JAR file created by the
|
Nope, we are about "reproducible builds". In short, if you build a (let's assume git tag), then if I re-build same tag (on same OS/Java -- but this has some leeway), I should end up with same (binary wise) JAR output, like you. In other words, if you do |
Just tested, I thought that Maven was adding automatically the For the time stamp of the ZIP file itself, maybe it could be set to the time stamp of the most recent entry? |
I'm neither talking about the content of |
Ah okay. I guess that whether they could be set to the timestamp of the source files or git commit has already been discussed then. |
But is this then not more the Another one would be as you described to download the real jar first and then extract the used timestamp value from there, then inject it into the reproducible build. |
quite works, but complex and does not give one simple workflow: as a developer, I want to build my source code twice and get the same output (which will also help build-cache) I don't see how we can be less basic than a fixed timestamp by default in Maven core: perhaps a less strange default value could lower bad feelings about it, something like |
To be honest I never wanted that in the last 10+years :-D Also if it is really about zip time stamps then I thing it is really something that should be handled in the jar-plugin (or even archiver component), e.g. for me a more sensible default would be to use the last modification time of the oldest file (there are even options to sync git time with local time) and maybe give a warning that it is not 100% portable (what is even not the case for Linux/Windows or different JVMs already anyways). |
Yes, I think this would be more obvious, but why not |
yeah,
all that is independent improvement that we can work out in a separate stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Correct me if I'm wrong, but it seems to me that one of the main goals of reproducible build is security: allowing developers to verify that the released JAR files do not contain altered byte codes (e.g. malicious code injected by a compromised compiler). For this goal, the timestamp of ZIP entries does not matter. Only the content of ZIP entries matter. In my understanding, a verification focussed on what matter is called "semantically reproducible build" or "semantic equivalency". Microsoft seems to propose a tool for semantic equivalency at least for NPM packages. Are we pushing a bit for bit reproducible build because we have no easy tool for semantic equivalency? If yes, what about instead developing a new Maven plugin or modifying This proposal would allow the following workflow during release: the release manager deploys the JAR files on a staging repository and give the URL to other developers. Other developers would use that URL with the above-cited new plugin, which would automatically build the project and compare semantically with the JARs on the staging repository.
Same for me. What I want is security check. Actually, I would rather not desire bit for bit reproducibility, as I would find more useful to keep the (non-standard) |
Tycho has exactly this kind of "semantic equivalency" here: it is not used for "reproducibility" instead it is used to check if an artifact only differs in version, and in this case the artifact is not deployed. Additionally if it differs but version has not changed one gets an error / warning that one needs to increment the version (this is similar to this use-case here: If i build the same version the jar should be "semantic equivalent" but bit to bit equivalence is not important). This currently even can detect if a file only differs in line endings (e.g. |
I disagree. Having the binaries being stable allows some optimisation in the build downstream. I'd really like to keep that. This allows the compiler to skip as the input and dependencies have not changed, same for resources, which cascades to the entire build. If the generated jar for a dependency is changed (with a different timestamp in the zip), the compiler needs to recompile for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
One can always use the file modification time, e.g as far as I know maven already tries to not overwrite a file if it is the same bytes on some places, now the same must only be applied to jar (what actually can be seen as a FileSystem where individual item might or might not be updated / deleted / added).
But now with a fixed timestamp by default how will one know the dependency has "changed"? Especially for this case a "semantic equivalence" would pay of, e.g. compilation must not be performed if only a resource changed in a jar or a property file but only with class file changes. One can even go a step further and say that recompilation is even only needed if a |
I agree with this goal, but I don't think that we need reproducible build for that. By default, Relying on reproducible build for avoiding unnecessary recomputation is useful only if the previous step has already done unnecessary recomputation anyway, since it rewrote an identical JAR file. So the goal have been half-missed, and would be more efficiently achieved by the approach proposed in the previous paragraph. |
Yes, that's what we do, we don't overwrite if nothing has changed. But if you change the timestamp of the zip entries, the binary zip file will differ, and maven will overwrite. Which would break the whole thing.
This is not the timestamped of the files afaik. When you copy a file, maven does not set the timestamp to the value we're talking about here. This is irrelevant here. |
I don't think so. Try it on maven. Just run In all cases, even if we have smarter plugins, if the input data has changed somehow, you will have to run again. The only way to avoid that is to not change the input. And dependencies are part of the input. So even if we have a smarter api, we'll need stable artifacts during a build, else we'll loose any possibility of optimisation. |
My idea was to ensure that "stable artifacts" means "not rewritten" rather than "rewritten identically", in which case the situation (for the purpose of optimization) become equivalent to reproducible build.
In that case, we are already close to above-mentioned equivalence, isn't it? If the optimization doesn't work yet, in my understanding on the compiler plugin side, it seems to be a bug (the change detection algorithm in "incremental compilation" is comparing relative paths against absolute paths, thus always thinks that there is changes). Therefore, reproducible build would be a workaround for such bugs rather than something fundamentally needed for optimization. But the same workaround would work with semantic equivalence if we compute the SHA1 on ZIP entries instead of the ZIP file (admittedly maybe not using standard Unix tools). Note: I'm all in favour of the goals behind reproducible builds. I just think that throwing away metadata like timestamps is going a bit too far, and that semantic equivalence as Microsoft and Eclipse Tycho do would allow us to have the best of both worlds. |
My personal preference would be using the value But if the value is the beginning of 21st century, I'm also fine with it. |
I like your idea of a minimum ZIP date... |
Yes, that would be better. Again, if you run
The compiler plugin checks the timestamps of the dependencies and if anything changed, it will recompile all the classes. This is the only way to ensure a valid output. The plugin has no knowledge of any kind of "equivalence". While this is doable, I don't see the real benefits if we can already have the same without having to unzip/sha1 all the class path (for the compiler). That said, we still have the following PR which we could use to get a smarter computation and allow plugins to more easily skip parts of their work (and not only all or none): #1118
|
Indeed, just tried and couldn't reproduce MCOMPILER-209 anymore, even with version 3.1 of the compiler plugin on which the bug report was reported. Maybe that issue should be closed, but I have no idea where the correction is.
Yes I know. But what I was trying to said is that if the JAR has not been rewritten, then we don't need reproducible build for optimization. |
yes, there is a link with incremental build when the rebuild is done just after the initial build, and the local builder can optimize to reuse instead of rebuild some artifacts Reproducible Builds has in addition "third party rebuild" case, where the rebuild is done much later, by someone else, with a different env: both incremental build for local and RB for 3rd party need to be available, and consistent; This is one reason why Git commit timestamp is not efficient on large multi-module builds this "3rd party rebuild" case is also where RB has a great effect: it lets us know when artifacts contain environment-specific content in output, like username, machine name, local path, or any personal local file. Nothing really security / malware oriented (even if local data can be PII or considered leakage), but it's what we in practice find regularly when checking RB during our Maven components release votes example just now: https://lists.apache.org/thread/bx218cd5sc8pwm4frlfop8nxy0p8n7zq of a real world finding |
projects can opt-out if they want or override with their preferred timestamp value, but by default, having Reproducible Builds is a nice improvement
JIRA issue: MNG-8258
IT PR: apache/maven-integration-testing#369
Following this checklist to help us incorporate your
contribution quickly and easily:
for the change (usually before you start working on it). Trivial changes like typos do not
require a JIRA issue. Your pull request should address just this issue, without
pulling in other changes.
[MNG-XXX] SUMMARY
,where you replace
MNG-XXX
andSUMMARY
with the appropriate JIRA issue.[MNG-XXX] SUMMARY
.Best practice is to use the JIRA issue title in both the pull request title and in the first line of the commit message.
mvn clean verify
to make sure basic checks pass. A more thorough check willbe performed on your pull request automatically.
If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.
To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.
I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
In any other case, please file an Apache Individual Contributor License Agreement.