-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data-driven approach to v6 distro search #2265
Conversation
To offer opinions:
I was under the impression that there is no vuln data for old unstable versions -- that the unstable feed is only for the current unstable branch, is this correct? Do we know which codename (probably mixing up terms here) is currently in unstable at any given point? If so, I think considering something with an old codename + sid status as a prerelease for that codename would be the right thing to do, if we have enough info to do so.
IIRC there was a difficulty mapping the unstable alpine feed without this behavior. Maybe just that the data for the distro isn't published in the normal feed until the release becomes stable? But if the versions align without doing some mapping to "edge", I would agree it sounds better to just use the version name directly.
Probably, yes. Agree it doesn't have to be done here -- only optimize things that need it. I worked on a configuration-heavy app that did something like this: always keep configuration loaded in memory and it worked great. If nothing else, it kept a lot of the programming model easier. |
It looks like there are a couple of answers here:
|
Signed-off-by: Alex Goodman <[email protected]>
ed7834a
to
8d38337
Compare
return query | ||
// search for aliases for the given distro; we intentionally map some OSs to other OSs in terms of | ||
// vulnerability (e.g. `centos` is an alias for `rhel`). If an alias is found always use that alias in | ||
// searches (there will never be anything in the DB for aliased distros). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens here when we do add affected entries for a distro that was previous an alias (like when we bring in data for almalinux
)? I think what would happen is that the alias would no longer exist in the grype db that is published that now contains the almalinux data, but just want to validate that assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all prior clients would adjust correctly to it becoming a comprehensive vulnerability dataset rather than an alias to rhel
data
Today we have a few hard coded elements that affect which namespace is selected for given distro information:
sid
in the pretty-name then explicitly set the version tounstable
which is acted upon downstream to map tosid
vulnerability dataedge
vulnerability stream manuallymajor
vsmajor.minor
versions: we search multiple combinations of version segment combinations to find the most accurate vulnerability streamThe upside of this logic is that it makes matching more accurate, the downside is that this kind of code requires a
grype
release to update (thus users need to updategrype
itself to get these updates). This is not ideal since a distro-related version change or vulnerability data change can happen at any time.Ideally this should be fundamentally data driven and flow in from the grype database, which is updated nightly -- this PR nudges us heavily in that direction by migrating this kind of logic to a DB lookup table, which is populated with grype-db, and paired with more agnostic processing of distro data (for example, we still have version specificity searching, but it is unaware of specific ).
Open questions:
debian@sid
:today we look for "sid" in the PRETTY_NAME for os-release, which is something that is fixed to the artifact, but as I understand it sid is a moving target... so wouldn't it be invalid to match non-trixie sid vuln data (foxy is the next codename) with trixie artifacts? Doesn't that mean we should exclusively look at the codename and ignore sid status altogether?We're going to need to appropriate pkg source lists into syft / grype / grype-db / vunnel. That is outside of the scope of this PRalpine@edge
:it looks like this would be a single version with multiple pre-releases, but the vunnel data is specifically hard coding this to "edge", is this fictitious? (should we just use versions as we find them in the vuln data)this is correct, we should be using the alpha versions to correlate with edge, no matter the major/minor versionshould this logic be lifted from the DB into memory for faster continual access of OS specific questions? That is, much of this is querying the OS table for each distro search, but that table is relatively small and could be replaced with a single datastructure that is read into memory upon the first query.deferred; I don't want to optimize too much in advance though, so elected to not do this first since it could always be done at a later date since that could be implemented in a non-breaking way.