Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Rust multi-license declaration, plus new Rust license tests #858

Merged
merged 2 commits into from
Jun 24, 2021

Conversation

johnbatty
Copy link
Contributor

@johnbatty johnbatty commented Jun 22, 2021

Added some Rust multi-licensing tests to help diagnose #856

  • A couple of tests of the clearlydefined scanner, which show that it is correctly declaring the Rust licenses
  • Another test that runs some real raw data through the multiple scanners and aggregrator. This demonstrates that Rust multi-licenses are declared incorrectly, and is useful to run under a debugger with breakpoints to view/understand the operation.

The latter test should possibly be moved into a separate "functional" test, as it is testing real data through multiple components. I plan to clean it up to run through multiple example data files to confirm that they are all declared as expected.

I'm not sure whether the raw data files should be in an evidence directory, or a fixtures directory, or somewhere else - I used evidence as there was a similar example that used this structure.

@johnbatty
Copy link
Contributor Author

johnbatty commented Jun 22, 2021

@nellshamrell I have made a proposed fix for the Rust crate licensing, following our discussion.

I have modified the non-clearlydefined summarizers to avoid declaring the license for Rust crates because:

  • their current logic gets it wrong (ANDing together the license types from the license files)
  • they don't read Cargo.toml to find out what the declared license is, which has got to be the best way to get a crate license
  • the clearlydefined summarizer does read Cargo.toml and declares the license from that

The tests that I have added show that this works for selection of example cases I have added.

There is a quirk of the ClearlyDefined SPDX parser adding unnecessary parentheses when there are 3 licenses. e.g.

  • MPL-2.0 OR MIT OR Apache-2.0 declared as MPL-2.0 OR (MIT OR Apache-2.0)

I see you've previously raised this as an issue with the SPDX library: clearlydefined/spdx#21.

It would be good to fix this, as it seems that Cargo license expression parsing does not support parentheses:

Although the license declared by ClearlyDefined won't be used directly by Cargo, it seems a little wrong to declare a license string that cannot by parsed by Cargo. I think I'll take a look at whether we can just validate the SPDX expression providing in Cargo.toml, and if valid then use the string as-is, rather than using the "normalized" string.

As we discussed, there is the (hopefully rare) possibility that the crate declares the license incorrectly, and there are files with other licenses in the repo. However, I think that trying to deal with this case now would add significant complexity. I'd rather fix up the existing problem first to get the licensing declared correctly for the vast majority of crates with license choices.

@johnbatty johnbatty changed the title Add Rust license tests Fix for Rust multi-license declaration, plus new Rust license tests Jun 23, 2021
@johnbatty johnbatty marked this pull request as ready for review June 23, 2021 06:03
@nellshamrell
Copy link
Contributor

@johnbatty this looks fantastic! Thank you so much!

And I agree with you, based on previous work, the evidence directory seems like the best place to put the raw data files.

Approving this now, will likely deploy this tomorrow.

@nellshamrell nellshamrell merged commit dee0a24 into clearlydefined:master Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants