Skip to content

Commit

Permalink
implement implement /api/v1/analysis/root-component, add get_purl pg …
Browse files Browse the repository at this point in the history
…func, loadgrap and rwlocked hashmap<petgraph>
  • Loading branch information
JimFuller-RedHat committed Sep 10, 2024
1 parent 48605ee commit c584c01
Show file tree
Hide file tree
Showing 30 changed files with 2,990 additions and 23 deletions.
49 changes: 49 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ members = [
"modules/ingestor",
"modules/storage",
"modules/graphql",
"modules/analysis",
"server",
"trustd",
"test-context",
Expand Down Expand Up @@ -88,6 +89,7 @@ packageurl = "0.3.0"
parking_lot = "0.12"
peak_alloc = "0.2.0"
pem = "3"
petgraph = { version = "0.6.5", features = ["serde-1"] }
prometheus = "0.13.3"
quick-xml = "0.36.1"
rand = "0.8.5"
Expand Down Expand Up @@ -152,6 +154,7 @@ trustify-module-ingestor = { path = "modules/ingestor" }
trustify-module-storage = { path = "modules/storage" }
trustify-module-graphql = { path = "modules/graphql" }
trustify-test-context = { path = "test-context" }
trustify-module-analysis = { path = "modules/analysis" }

# These dependencies are active during both the build time and the run time. So they are normal dependencies
# as well as build-dependencies. However, we can't control feature flags for build dependencies the way we do
Expand Down Expand Up @@ -180,3 +183,4 @@ postgresql_commands = { version = "0.16.3", default-features = false, features =
#cpe = { git = "https://github.com/ctron/cpe-rs", rev = "c3c05e637f6eff7dd4933c2f56d070ee2ddfb44b" }
# required due to https://github.com/voteblake/csaf-rs/pull/29
csaf = { git = "https://github.com/chirino/csaf-rs", rev = "414896904bc5e5287fd88b1daef5c27f70503d01" }

153 changes: 132 additions & 21 deletions docs/adrs/00001-graph-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Date: 2024-08-26

## Status

Draft
Accepted

## Context

Expand Down Expand Up @@ -44,12 +44,15 @@ This roughly translates to:

## Process flow

Graphs will be loaded 'lazily' eg. when they are requested.
Graphs will be loaded 'lazily' eg. when they are requested:

1) retrieve list of _latest_ unique sbom ids searched by like or exact name.
2) Using unique sbom ids, filter query package_relates_to_package (resolving left and right ID purl strings)
3) load into write locked hashmap<petgraph>, where the key is the sbom_id

We can easily 'prime' the loading of graphs either by programmatically invoking service load_graphs() (from say importing and ingesting an sbom) or
indirectly by running a series of HTTP requests.

Once a graph is loaded then query is
1) retrieve list of _latest_ unique sbom ids searched by like or exact name.
2) read only access on hashmap<petgraph>, looping through petgraphs performing ancestor node search
Expand Down Expand Up @@ -78,32 +81,133 @@ HTTP GET api/v1/analysis/root-component/{component-purl}

all of the above should return paginated lists:

```
```json
{"total" : 2,
"items" : [
{
"items": [
{
"purl": "pkg://rpm/redhat/[email protected]?arch=x86_64",
"name": "libproxy-webkitgtk4",
"published": "2024-07-30 19:22:06+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/MTA-6.2.Z",
"product_name": "MTA-6.2.Z",
"product_version": "6.2.z",
"ancestors":[ .... ]
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-3d373f74-e7e5-4c41-90d9-b29d5a7e2312",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:ff2bb666c2696fed365df55de78141a02e372044647b8031e6d06e7583478af4?arch=x86_64&repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle",
"published": "2024-08-06 05:28:39+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/OSE-OSC-1.2.0-RHEL-8",
"product_name": "OSE-OSC-1.2.0-RHEL-8",
"product_version": "1-2",
"ancestors": [
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-440459d0-4240-4ac9-844f-2fd6b3d37429",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:782313f2b91c115191593b8b63e10f5e30cf87f6c2d15618d6bfa359f51de947?repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle"
}
]
},
{
"purl":"...",
"name":"...",
"published":"...",
"product_name":"...",
"product_version":"...",
"ancestors":[...]
} ...
]
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-440459d0-4240-4ac9-844f-2fd6b3d37429",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:782313f2b91c115191593b8b63e10f5e30cf87f6c2d15618d6bfa359f51de947?repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle",
"published": "2024-08-06 05:28:39+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/OSE-OSC-1.2.0-RHEL-8",
"product_name": "OSE-OSC-1.2.0-RHEL-8",
"product_version": "1-2",
"ancestors": []
},
....
]
}
```

where ancestors contain purl, name, published and document_id which answers our questions.

**Retrieve a component(s) dependencies**
HTTP GET api/v1/analysis/dep?q={}
HTTP GET api/v1/analysis/dep/{component-name}
HTTP GET api/v1/analysis/dep/{component-purl}

```json
{
"items": [
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-3d373f74-e7e5-4c41-90d9-b29d5a7e2312",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:ff2bb666c2696fed365df55de78141a02e372044647b8031e6d06e7583478af4?arch=x86_64&repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle",
"published": "2024-08-06 05:28:39+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/OSE-OSC-1.2.0-RHEL-8",
"product_name": "OSE-OSC-1.2.0-RHEL-8",
"product_version": "1-2",
"deps": []
},
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-440459d0-4240-4ac9-844f-2fd6b3d37429",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:782313f2b91c115191593b8b63e10f5e30cf87f6c2d15618d6bfa359f51de947?repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle",
"published": "2024-08-06 05:28:39+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/OSE-OSC-1.2.0-RHEL-8",
"product_name": "OSE-OSC-1.2.0-RHEL-8",
"product_version": "1-2",
"deps": [
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-3d373f74-e7e5-4c41-90d9-b29d5a7e2312",
"purl": "pkg://oci/sandboxed-containers-operator-bundle@sha256:ff2bb666c2696fed365df55de78141a02e372044647b8031e6d06e7583478af4?arch=x86_64&repository_url=registry.redhat.io/openshift/sandboxed-containers-operator-bundle&tag=1.2.0-24",
"name": "sandboxed-containers-operator-bundle",
"deps": []
}
]
},
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-6447af1f-4faa-481a-a1e6-90ecbb6ab631",
"purl": "pkg://github/openshift/sandboxed-containers-operator@9b5eef5d49a967ba3240f01a2cb4476c44f1f66e",
"name": "sandboxed-containers-operator",
"published": "2024-08-06 05:28:39+00",
"document_id": "https://access.redhat.com/security/data/sbom/spdx/OSE-OSC-1.2.0-RHEL-8",
"product_name": "OSE-OSC-1.2.0-RHEL-8",
"product_version": "1-2",
"deps": [
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-0b155bf3-15cc-4353-9e47-58b722ed067e",
"purl": "pkg://oci/osc-must-gather-container@sha256:97c02ff2227bb56c2edeb37f674db11ebd0a5ab63897b64e852d7db11163e1ba?repository_url=registry.redhat.io/openshift-sandboxed-containers-operator-must-gather&tag=1.2.0-11.1655140658",
"name": "osc-must-gather-container",
"deps": [
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-ff914a27-712b-4e23-8074-f530f0fa2eca",
"purl": "pkg://golang/github.com/pmezard/[email protected]",
"name": "go-difflib",
"deps": []
},
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-ff198394-86b5-4cf5-a58f-da0ef9b516e3",
"purl": "pkg://golang/k8s.io/api/admissionregistration/[email protected]",
"name": "v1",
"deps": []
},
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-ff086cd5-146b-442f-85a7-2a1c84767ab9",
"purl": "pkg://rpm/redhat/[email protected]?arch=noarch",
"name": "crypto-policies",
"deps": []
},
{
"sbom_id": "0191d750-49a2-72b2-bf1a-f7c6ae792518",
"node_id": "SPDXRef-fe93b41a-ebab-48b5-813b-60307b4a711d",
"purl": "pkg://rpm/redhat/[email protected]?arch=x86_64",
"name": "openldap",
"deps": []
},
...
]
}
```

## Alternative approaches

We could query existing package_relates_to_package to resolve relationships though previous attempts with a pure SQL based
Expand Down Expand Up @@ -134,9 +238,16 @@ We are mostly interested in answering these questions in the current context whi
relationships as defined in latest version of SBOMs. Answering this question in a historical context is out of scope (possible
but much more complicated).

Performance is limited by the fact we bespoke build a graph for each query ... we should optimise this approach by having
a graph always available (loaded with latest version SBOM relationships) either as a single graph or a hashmap<graph> (containing
a graph per sbom).

Loading and interrogating an 'in memory' graph has resource implications - it might be that this analytics process, at scale, will
need processing to be isolated (for example, as separate pod(s) in openshift). We might also have to consider connection specific
postgres configuration (and/or connect to a dedicated read only postgres replica).

Performance is limited by the fact we bespoke build a graph for each query ... we could optimise this approach by having
a graph always available (loaded with latest version SBOM relationships).
The default performance profile of graphmap should be biased toward multiple concurrent read access - loading graph into a hashmap will lock it.

It should be possible to parallise the loading and querying of multiple graphs in hashmap.

It should be possible to use https://crates.io/crates/lru to limit number of graphs in hashmap.
9 changes: 9 additions & 0 deletions entity/src/relationship.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use sea_orm::{DeriveActiveEnum, EnumIter};
use std::fmt;

#[derive(
Debug,
Expand Down Expand Up @@ -45,4 +46,12 @@ pub enum Relationship {
DevToolOf,
#[sea_orm(num_value = 13)]
DescribedBy,
#[sea_orm(num_value = 14)]
PackageOf,
}

impl fmt::Display for Relationship {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?}", self)
}
}
Loading

0 comments on commit c584c01

Please sign in to comment.