Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Add plugin to generate license and notice files #11977

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

bryanck
Copy link
Contributor

@bryanck bryanck commented Jan 15, 2025

This PR adds a Gradle plugin for generating the LICENSE and NOTICE files for runtimes and bundles that include third party libraries. The PR includes a custom renderer for generating the files, so we can adjust that as needed.

The following Gradle tasks were added:

  • generateLicenseReport - generates new LICENSE and NOTICE files in the build directory
  • compareLicenseReport - compares the generated files with the existing ones and fails if they don't match
  • updateLicenseReport - replaces the existing files with the newly generated ones

The compareLicenseReport will run automatically as part of the build, when the check task runs. If the files don't match and the build fails, then the files will need to be updated by running the updateLicenseReport task manually.

I enabled the plugin for the following projects: aws-bundle, azure-bundle, gcp-bundle, and kafka-connect-runtime. These projects all used a variation of the plugin in the past to generate the files, so are the safest to include. I also regenerated the files for these projects using this plugin so the build will pass.

This addresses #11559

}

dependencies {
implementation 'com.github.jk1:gradle-license-report:2.9'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency needed to be declared here so that the classes were available outside of build.gradle. We could also move the other buildscript dependencies in here, but I left them as-is.

@bryanck bryanck force-pushed the license-report branch 2 times, most recently from c5cb209 to c289050 Compare January 15, 2025 20:42
@bryanck bryanck requested a review from ajantha-bhat January 15, 2025 20:50
@RussellSpitzer
Copy link
Member

I liked the idea that Ryan suggested this morning, where our plugin would always generate and check the generated files against a set of committed existing files and report an error if there was a change from that state. Basically a RevAPI for our license files

@bryanck
Copy link
Contributor Author

bryanck commented Jan 16, 2025

I liked the idea that Ryan suggested this morning, where our plugin would always generate and check the generated files against a set of committed existing files and report an error if there was a change from that state. Basically a RevAPI for our license files

I made this change.

@Fokko
Copy link
Contributor

Fokko commented Jan 16, 2025

This is looking great @bryanck, thanks for working on this 👍

How about documenting the process of checking if the NOTICE files are up to date as part of the how-to-release?

What Ryan also mentioned is that not everything in the NOTICE is required. The official ASF docs state: Do not add anything to NOTICE which is not legally required. For example, if we copy the NOTICE from Parquet-Java, the general NOTICE contains a section that's only relevant for parquet-protobuf that we don't use in the project.

I liked the idea that Ryan suggested this morning, where our plugin would always generate and check the generated files against a set of committed existing files and report an error if there was a change from that state. Basically a RevAPI for our license files

I like that as well, we could run the compareLicenseReport task as part of the CI.

@Fokko Fokko requested a review from jbonofre January 16, 2025 20:42
@bryanck
Copy link
Contributor Author

bryanck commented Jan 16, 2025

This is looking great @bryanck, thanks for working on this 👍

How about documenting the process of checking if the NOTICE files are up to date as part of the how-to-release?

Sure will do.

What Ryan also mentioned is that not everything in the NOTICE is required. The official ASF docs state: Do not add anything to NOTICE which is not legally required. For example, if we copy the NOTICE from Parquet-Java, the general NOTICE contains a section that's only relevant for parquet-protobuf that we don't use in the project.

I had some discussion w/ Dan around this, I'll investigate this some more, there was some conflicting information.

I like that as well, we could run the compareLicenseReport task as part of the CI.

Yes, I added this, the check task depends on compareLicenseReport.

@@ -203,320 +203,468 @@

--------------------------------------------------------------------------------

This binary artifact contains code from the following projects:
This product includes a gradle wrapper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of licenses here which seem like they shouldn't be included? Why does the AWS bundle have a gradle wrapper, avro, parquet ... etc Looks like it got a lot of license notices from something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm appending to the root LICENSE and NOTICE files, I think some new items were added to those recently. I could remove that if we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't they be in the other LICENSE Bundles then?

For example I see in this file

This product includes code from Apache Spark.

* dev/check-license script
* vectorized reading of definition levels in BaseVectorizedParquetValuesReader.java
* portions of the extensions parser
* casting logic in AssignmentAlignmentSupport
* implementation of SetAccumulator.
* Connector expressions.

Copyright: 2011-2018 The Apache Software Foundation
Home page: https://spark.apache.org/
License: https://www.apache.org/licenses/LICENSE-2.0

But I don't see that in azure-bundle or gcp-bundle

I do see it in kafka-connect but that one makes sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm let me double check what's going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah got it, I must have not expanded my diffs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanck But are they transitive dependencies of this particular bundle? That's I think where I am a bit lost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They may not be, so I'm thinking I should remove the appending to the root files. The root files used to be pretty minimal but some new things were added recently.

Copy link
Contributor Author

@bryanck bryanck Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and make that change, I don't think it makes sense to do this anymore (appending to the root files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this so it strips off the extra deps after the initial Apache license in the root files, so those extra deps don't get added.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bryanck for the PR, it's great to see the work that has been originally added to Nessie and then to Apache Polaris is getting reused in other projects as well.

But instead of copying the work, I think it would be much better if all the efforts around the NOTICE and LICENSE files is bundled and shared so that not every Apache project has to redo all that work, but is instead reusable for all projects. It can easily become a Gradle plugin on its own.

I'm requesting changes to this PR, because as it is based on code of Apache Polaris (I think @jbonofre mentioned it on the last Iceberg community sync) it requires a mention - and I suspect with this also projectnessie as the original source of the work.

@bryanck
Copy link
Contributor Author

bryanck commented Jan 16, 2025

I'm requesting changes to this PR, because as it is based on code of Apache Polaris (I think @jbonofre mentioned it on the last Iceberg community sync) it requires a mention - and I suspect with this also projectnessie as the original source of the work.

This was not copied from another project, we have used this in Iceberg before Polaris existed, but simply didn't contribute the code. #8261 (comment)

@snazy
Copy link
Member

snazy commented Jan 17, 2025

This was not copied from another project, we have used this in Iceberg before Polaris existed, but simply didn't contribute the code. #8261 (comment)

Then I retract my comment.

@RussellSpitzer
Copy link
Member

I would like to be clear that it is extremely important that we assume all other engineers are contributing in good faith and the interaction above is failing to meet that standard. It is not appropriate to make public accusations of copying code in pull request comments.

In situations where a community member believes there has been some sort of infringement, I would recommend that the community member talk directly with the code contributor and try to resolve the issue privately first. In this community it is appropriate to help others if you believe they have inadvertently misattributed code or have doubts about provenance, but this must not be done in an aggressive or insulting manner.

If talking one on one fails to resolve the issue, immediately notify the PMC. Knowingly copying code into the project in disregard to copyright or without attribution is a serious offense and damages the entire Apache community. Because of how serious these accusations are, it's crucial that we do not make them lightly.

I'd like to again thank @bryanck for his contributions to the project and reiterate that we should strive to have positive interactions with each other and we should apologize when we fall short.

Copy link
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it.

I hope we do a follow up PR for other runtime jars like Spark, Flink, REST-testFixture etc.

license.gradle Outdated
File licenseFile = new File("${projectDir}/build/reports/dependency-license/LICENSE")
File currentLicenseFile = new File("${projectDir}/LICENSE")
if (licenseFile.text != currentLicenseFile.text) {
throw new RuntimeException("License file has changed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention about what to do next, like Run updateLicenseReport with a complete gradle command.

Similar to how RevAPI or spotless prompts when fails about the next step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added this.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

It isn't curated as we like, but I do think this is more complete and will also notify us when new dependencies are added.

WDYT @danielcweeks @rdblue @jbonofre

@@ -5,111 +5,21 @@ Copyright 2017-2024 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

--------------------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we need to keep this one. This is not the result of a dependency, but because we copied code from the Kite project. This comes from the parent NOTICE: https://github.com/apache/iceberg/blob/main/NOTICE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it should not be there imho.

@jbonofre
Copy link
Member

@bryanck thanks a lot for working on that ! I will take a look as I'm doing a complete pass on artifacts LICENSE/NOTICE.

Generally speaking, by experience on Apache projects, it's pretty hard to have automate tools to generate LICENSE/NOTICE (because the inner shading is hard to check, some "deps" LICENSE and NOTICE may have other names like COPYING, etc). However, I'm completely in favor on a tool to "assist" the contributors.

So, big thanks again to @bryanck and let me take a look on this PR 😄

Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of "not fully correct" content in LICENSE (about slf4j-api).

I gonna take a new deeper look but I like this as "assistance tool".

I also found another issue in our distributed jar files (unrelated to this change), I will create another PR.

Thanks again @bryanck !

@@ -5,92 +5,22 @@ Copyright 2017-2024 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

--------------------------------------------------------------------------------

NOTICE for Group: commons-codec Name: commons-codec Version: 1.17.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to duplicate the line "This product includes software developed at the Apache Software Foundation...", though the ASF copyright line and any other portions of NOTICE must be considered for propagation. I saw it's still included, which is fine, just wanted to provide some "background".

Group: org.reactivestreams Name: reactive-streams Version: 1.0.4
Project URL: http://www.reactive-streams.org/
License: MIT-0 - https://spdx.org/licenses/MIT-0.html
Group: org.slf4j Name: slf4j-api Version: 2.0.16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, that's not correct here: slf4j-api is excluded (https://github.com/apache/iceberg/blob/main/azure-bundle/build.gradle#L43), and not in the distributed jar file, so it should not be in LICENSE.

See this PR: #12052


--------------------------------------------------------------------------------

Group: org.slf4j Name: slf4j-api Version: 2.0.16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, imho, it's not correct because slf4j-api is excluded (https://github.com/apache/iceberg/blob/main/gcp-bundle/build.gradle#L42) and it's not part of the distributed jar file.

See this PR: #12052

@@ -5,111 +5,21 @@ Copyright 2017-2024 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

--------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it should not be there imho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants