Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SBOM import does not trigger scan of packages #121

Open
ghsa-retrieval opened this issue May 15, 2024 · 16 comments
Open

BUG: SBOM import does not trigger scan of packages #121

ghsa-retrieval opened this issue May 15, 2024 · 16 comments
Labels
bug Something isn't working design needed Design details needed to complete the issue enhancement New feature or request

Comments

@ghsa-retrieval
Copy link

ghsa-retrieval commented May 15, 2024

Describe the bug
On a self-hosted instance of DejaCode, it appears that the current main branch of DejaCode does not scan individual packages after loading the SBOM. This feature seems to work on the public demo instance.

Tested with:

To Reproduce
Configure dataspace:

  1. In "Application Process Settings" activate "Enable package scanning"
  2. In "Application Process Settings" activate "Update packages automatically from scan"

Steps to reproduce the behavior:

  1. Create a product
  2. Open the product
  3. Click on the "Scan" dropdown and select "Load Packages from SBOMs"
  4. Select an SBOM of your choice (e.g. sbom-1-4.cdx.json)
  5. Enable "Update existing packages with discovered packages data"
  6. Enable "Scan all packages of this product post-import"

Additional information which may or may not be relevant:

  • I renamed and edited the nexB dataspace for this (which also locks me out of creating new dataspace, not sure if that is expected?)
  • "Enable PurlDB access" is deactivated
  • "Enable VulnerableCodeDB access" is deactivated
  • The PurlDB URL is still in the configuration

Expected behavior
After loading the packages through the load_sbom pipeline in ScanCode.io, each individual package should be analyzed with a scan_single_package pipeline and the results added to the respective packages in DejaCode.

Screenshots
No screenshots, as error is that actions are not happening

Context (OS, Browser, Device, etc.):
Firefox

@ghsa-retrieval ghsa-retrieval added bug Something isn't working design needed Design details needed to complete the issue enhancement New feature or request labels May 15, 2024
@tdruez
Copy link
Contributor

tdruez commented May 16, 2024

@ghsa-retrieval Could you confirm that the ScanCode.io integration is properly configured on your DejaCode instance?
Click on your username in the top right corner to display the dropdown menu and select "Integration Status" or directly use this URL /integrations_status/
From this view, we can make sure that ScanCode.io is "Configured" and "Available".


I renamed and edited the nexB dataspace for this (which also locks me out of creating new dataspace, not sure if that is expected?)

You need to update the REFERENCE_DATASPACE setting https://dejacode.readthedocs.io/en/latest/application-settings.html#reference-dataspace accordingly to the renaming to ensure your Dataspace and related users have those permissions.

@ghsa-retrieval
Copy link
Author

ghsa-retrieval commented May 16, 2024

@tdruez Yes, it shows both "Configured" and "Available" with a green checkmark. The load_sbom pipeline works (with limitations) and packages are being added to the project, but they are not scanned individually to get detailed license and copyright information. The scanning for those details also works if I add a single package with "Add Package" and an URL to the package's archive. So some parts of the integration are definitely working.

You need to update the REFERENCE_DATASPACE setting https://dejacode.readthedocs.io/en/latest/application-settings.html#reference-dataspace accordingly to the renaming to ensure your Dataspace and related users have those permissions.

Makes sense, that was just a bit unexpected when configuring it through the UI.

@ghsa-retrieval
Copy link
Author

ghsa-retrieval commented May 16, 2024

The same issue seems to happen when using "Scan" > "Scan All Packages". The UI reports that the job has been successfully submitted, but they never appear in the scan list nor does ScanCode.io list new projects. Hence, this might not be related to the SBOM import itself.

2024-05-16-dejacode-scan-all-packages

@tdruez
Copy link
Contributor

tdruez commented May 16, 2024

@ghsa-retrieval Thanks for the details. My hunch is that the problem may be located in the async task that is responsible for submitting the scan requests.
Could you look into the worker logs if you find anything looking like an error using: docker compose logs worker

@ghsa-retrieval
Copy link
Author

ghsa-retrieval commented May 17, 2024

@tdruez Unfortunately no errors are being reported. It looks like DejaCode thinks it has successfully submitted a job, but the ScanCode.io log does not indicate that it is receiving anything nor that it runs into errors.

Do you have any other ideas where I should look?

2024-05-17-dejacode-log-censored
2024-05-17-scancode-log-censored

@tdruez
Copy link
Contributor

tdruez commented May 17, 2024

@ghsa-retrieval Thaks for the log, that's helpful. We can see that the task dje.tasks.scancodeio_submit_scan is properly called and executed but no URIs are provided:

INFO Entering scancodeio submit scan task with uris=[] ...

My guess is that none of your packages have a download_url defined.
At the moment, a download URL is required to fetch and scan a package from DejaCode.

Some Download URL could be generated from Package URL using the purl2url library but only a few package types are supported.

As a side note, the UI should be improved to warn you about the lack of Dowload URL instead of displaying a success message.

@ghsa-retrieval
Copy link
Author

ghsa-retrieval commented May 17, 2024

It seems that you're right, the imported packages from the SBOM only have the "Package URL" and "Inferred URL" populated, but not "Download URL". The SBOM that was uploaded has a purl and beneath properties a ResolvedURL. It's the same SBOMs as in aboutcode-org/scancode.io#1230

[...]
"components": [
        {
            "group": "",
            "name": "bootstrap",
            "version": "5.3.3",
            "hashes": [
                {
                    "alg": "SHA-512",
                    "content": "f072c2756832a0c82e48ef68f9a1fe8ae67e6a1b7e9b35b4bb71c833356eed2aeba6fec4041c539eb165482b24c1d635f843854129bbb8c2613501e474f7268e"
                }
            ],
            "purl": "pkg:npm/[email protected]",
            "type": "library",
            "bom-ref": "pkg:npm/[email protected]",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "manifest-analysis",
                            "confidence": 1,
                            "value": "/builds/beta/dso/tests-and-demos/dejacode-transitive-test/package-lock.json"
                        }
                    ]
                }
            },
            "properties": [
                {
                    "name": "SrcFile",
                    "value": "/builds/beta/dso/tests-and-demos/dejacode-transitive-test/package-lock.json"
                },
                {
                    "name": "ResolvedUrl",
                    "value": "https://registry.npmjs.org/bootstrap/-/bootstrap-5.3.3.tgz"
                },
                {
                    "name": "LocalNodeModulesPath",
                    "value": "node_modules/bootstrap"
                }
            ]
        },
[...]

Shouldn't that be working though? Where does DejaCode expect the URL to come from?

@tdruez
Copy link
Contributor

tdruez commented May 17, 2024

@ghsa-retrieval Unfortunately the CycloneDX does not include a clear field to store download URL for SBOM "components".

In ScanCode.io/DejaCode the download_url field is exported in the CycloneDX SBOM as aboutcode:download_url using custom properties defined at https://github.com/nexB/aboutcode-cyclonedx-taxonomy, see also https://github.com/CycloneDX/cyclonedx-property-taxonomy

cdxgen seems to be using the same properties approach with the ResolvedUrl property. I couldn't find much documentation about it on their repo though.

It would be interesting to have the list of properties generated by cdxgen to implement a mapping for importing those value during the CycloneDX ScanCode.io resolution.

@ghsa-retrieval
Copy link
Author

@tdruez There does not appear to be any documentation as far as I'm aware. The properties can be found in https://github.com/CycloneDX/cdxgen/blob/4a27933ee55914afecbd465ba4ca9a1da62a9cc1/utils.js#L818 being added through pkg.properties and apkg.properties.

Wouldn't it make more sense to derive the URL from the PURL though? I thought that was already uniquely identifying assuming that the PURL is for a package manager such as maven, npm, pypi and so on. That would be a general solution rather then trying to parse the custom properties of a particular SBOM generation tool.

Any solution is very much appreciated though!

@tdruez
Copy link
Contributor

tdruez commented May 17, 2024

Wouldn't it make more sense to derive the URL from the PURL though?

Maybe, but in the context of loading an SBOM, generating data that is not present in the SBOM may not always be wanted.
So kind of data integrity with the input is likely expected as the imported data.
This will require more discussion though.

Any solution is very much appreciated though!

I think in the very short term, we can add support for the ResolvedUrl property.

@ghsa-retrieval
Copy link
Author

Maybe, but in the context of loading an SBOM, generating data that is not present in the SBOM may not always be wanted.
So kind of data integrity with the input is likely expected as the imported data.
This will require more discussion though.

That is a valid point. The suggested approach would ensure that only information already present in the SBOM would be used.

I think in the very short term, we can add support for the ResolvedUrl property.

That would be great!

@tdruez
Copy link
Contributor

tdruez commented May 17, 2024

@ghsa-retrieval Support for ResolvedUrl property added on the ScanCode.io side in aboutcode-org/scancode.io#1241

You can update your ScanCode.io instance (no changes on the DejaCode side) and try again the "Load Packages from SBOMs" + "Scan all packages of this product post-import"

Keep in mind that only the packages that end up with a value for the download_url field will be scanned.

@ghsa-retrieval
Copy link
Author

@tdruez Works like a charm.

@pombredanne
Copy link
Member

pombredanne commented May 19, 2024

@ghsa-retrieval re:

Wouldn't it make more sense to derive the URL from the PURL though? I thought that was already uniquely identifying assuming that the PURL is for a package manager such as maven, npm, pypi and so on. That would be a general solution rather then trying to parse the custom properties of a particular SBOM generation tool.

There is code:

So there are many ways and what we need likely here is likely an explicit action to call the PurlDB to "enrich" an SBOM with these URLs... or do this in ScanCode.io.... a little design needed. #45

@ghsa-retrieval
Copy link
Author

@pombredanne that is what I suspected. From an outside perspective it would make sense to me if this feature would be in ScanCode.io, given that we already analyze the SBOM and try to do the same for underlying packages there.

@DennisClark
Copy link
Member

Note progress on deriving a download URL from a PURL when adding a package: #131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working design needed Design details needed to complete the issue enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants