From 1f24ad221bd54eee5a715a47ecc811e5eb55dcc4 Mon Sep 17 00:00:00 2001 From: Yan Wittmann Date: Mon, 10 Jul 2023 14:08:41 +0200 Subject: [PATCH 1/3] AEAA-343: Added documentation for detail levels Signed-off-by: ywittmann --- .../enrichment/vad-detail-levels.md | 113 ++++++++++++++++++ .../inventory-enrichment-overview.md | 1 + 2 files changed, 114 insertions(+) create mode 100644 doc/inventory-enrichment/enrichment/vad-detail-levels.md diff --git a/doc/inventory-enrichment/enrichment/vad-detail-levels.md b/doc/inventory-enrichment/enrichment/vad-detail-levels.md new file mode 100644 index 0000000..06b8366 --- /dev/null +++ b/doc/inventory-enrichment/enrichment/vad-detail-levels.md @@ -0,0 +1,113 @@ +> [Vulnerability Monitoring](../inventory-enrichment-overview.md) > VAD Detail Levels + +# Vulnerability Assessment Dashboard Detail Levels + +In some cases, displaying all the information available in a dashboard can lead to the dashboard growing rapidly in +size, depending on the amount of vulnerabilities the dashboard is generated for. + +To at least partially mitigate this, the dashboard can be configured to only display certain information, depending on +several factors. + +## Matchers + +Matchers are used to determine whether a certain detail level should be used for a vulnerability. They have the +following properties: + +- `status`: a comma-separated list of vulnerability statuses that the matcher applies to. Matches if the vulnerability + has one of the specified statuses. +- `allCpe`: a comma-separated list of CPEs that the matcher applies to. Matches if the vulnerability has all of the + specified CPEs. +- `anyCpe`: a comma-separated list of CPEs that the matcher applies to. Matches if the vulnerability has at least one of + the specified CPEs. +- `vulnerabilityName`: a comma-separated list of vulnerability names that the matcher applies to. Matches if the + vulnerability has at least one of the specified names. + +The matcher will only match if all of the specified properties match. + +## Detail Levels + +The following detail levels are available: + +- `timeline`: whether the timeline should be displayed. This attribute is the main reasons the detail levels were + introduced, as the timeline can not only take very long to generate, but also take up a lot of space in the dashboard. +- `references`: whether the vulnerability references should be displayed. +- `advisoriesGlobal`: whether advisory information should be displayed at all. +- `advisoriesReferences`: whether references in the advisory information should be displayed. +- `advisoryByTypes`: what type of advisories should be shown, whilst all others are hidden. + - `any` (default): any advisory type. + - comma-separated list of advisory types (notice, ...) +- `advisoryByProviders`: what provider of advisories should be shown, whilst all others are hidden. + - `any` (default): any advisory provider. + - comma-separated list of advisory providers (CERT-FR, GHSA, ...) +- `eolDate`: whether the EOL date information should be displayed. + +## Default Detail Level + +By default, all properties are set to the most detailed level, to ensure that all information is displayed. + +## Creating Detail Levels + +### Correlation YAML + +```yaml +- affects: + Id: linux-kernel + append: + VAD Detail Level Configurations: |- + matcher: + status = "in review, insignificant"; + allCpe = "cpe:/a:linux:linux_kernel, cpe:/o:linux:linux_kernel"; + anyCpe = "cpe:/a:linux:linux_kernel, cpe:/a:linux_test:linux_kernel"; + vulnerabilityName = "CVE-2023-35829, CVE-2023-35828, CVE-1999-0431"; + detail: + timeline = "false"; + advisoriesGlobal = "false"; + matcher: status = "in review, insignificant" + detail: timeline = "false" +``` + +The detail level information is appended to artifacts in the `VAD Detail Level Configurations` field. This field +contains a custom format: + +- `matcher`: the matcher information, as described above. +- `detail`: the detail level information, as described above. + +Each of these two sections contains information in the following format: `key = "value";` (a key-value pair), where the +key is the name of the property, and the value is the value of the property. The value is a string, and must be enclosed +in double quotes. + +The newlines are optional, but make the configuration easier to read. The last semicolon is optional as well. + +If multiple detail levels are specified, the individual detail levels must be separated by a newline. + +### VAD Configuration + +```xml + + + + + in review, insignificant + cpe:/a:linux:linux_kernel, cpe:/o:linux:linux_kernel + cpe:/a:linux:linux_kernel, cpe:/a:linux_test:linux_kernel + CVE-2023-35829, CVE-2023-35828, CVE-1999-0431 + + + false + true + true + true + + any + + + any + + true + + +``` + +The Vulnerability Assessment Dashboard configuration contains a `detailLevels` section, which is a map of detail levels. +Just like the correlation YAML, each detail level contains a `matcher` section, but the `detail` section is replaced by +the individual properties. diff --git a/doc/inventory-enrichment/inventory-enrichment-overview.md b/doc/inventory-enrichment/inventory-enrichment-overview.md index bf6fe54..b452b10 100644 --- a/doc/inventory-enrichment/inventory-enrichment-overview.md +++ b/doc/inventory-enrichment/inventory-enrichment-overview.md @@ -36,6 +36,7 @@ View the overview page of the [**Inventory Enrichment**](enrichment/inventory-en - [Vulnerability Status files](enrichment/vulnerability-status.md) - [Vulnerability Keywords files](enrichment/vulnerability-keywords.md) - [Vulnerability Filter format](enrichment/vulnerability-filter-format.md) +- [Dashboard Detail Levels format](enrichment/vad-detail-levels.md) - [Custom Vulnerability Data Sources](enrichment/custom-vulnerabilities.md) ### Data Sources Explained From 7c4467a22d8fb0a59a5cb17c6ff3f628622a59ae Mon Sep 17 00:00:00 2001 From: Yan Wittmann Date: Thu, 13 Jul 2023 09:27:56 +0200 Subject: [PATCH 2/3] Added partial new documentation for MSRC Signed-off-by: ywittmann --- .../msrc/understanding-data.md | 59 ++++++++++++++++--- 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/doc/inventory-enrichment/msrc/understanding-data.md b/doc/inventory-enrichment/msrc/understanding-data.md index a8ff53c..c04a780 100644 --- a/doc/inventory-enrichment/msrc/understanding-data.md +++ b/doc/inventory-enrichment/msrc/understanding-data.md @@ -75,9 +75,11 @@ Table of contents: ### Sources - [Security Update Guide - Microsoft Security Response Center](https://msrc.microsoft.com/update-guide) - - can **manually** be downloaded as multiple csv files, one per year. Time frame has to be set manually. - - contains `CVE/ADV ↔︎ KB` relations that are not present in the other sources. - - does not contain `KB ↔︎ KB` relations. + - ~~can **manually** be downloaded as multiple csv files, one per year. Time frame has to be set manually,~~ + - can **automatically** be mirrored using the underlying API used by the CSV generator on the website and using our + process (see below), + - contains `CVE/ADV ↔︎ KB` relations that are not present in the other sources, + - does not contain `KB ↔︎ KB` relations, - A short how-to can be found [here](performing-csv-download.md). - [Microsoft Security Updates API | MSRC](https://api.msrc.microsoft.com/cvrf/v2.0/swagger/index) - can **automatically** be mirrored using our process (see below), @@ -91,8 +93,7 @@ Table of contents: ### Problems with the data sources -- Only one of the three data sources can be retrieved automatically. The one available for mirroring is the most - important one, yet it still misses a lot of information. +- One of the three data sources cannot be mirrored automatically or manually, meaning some data is missing. - The data in between the different sources is inconsistent. Every data source either contains `CVE/ADV → KB` or `KB → KB`relations that are missing in the other sources. The Update Catalog knows about this information, but is inconsistent in itself. @@ -105,10 +106,7 @@ Table of contents: - **In the API**: the KB replaces (`4487259` and `4487081`) and is replaced by `4507423` ![Example](kb-chain-example-1.png) - This means, our data will never be complete unless considering all three data sources, which is impossible to - automate. it is reasonable to manually download the csv files from the MSRC, but scraping the Update Catalog is not - feasible, manually or automated. -- The MSRC csv has to be downloaded in time segments of one year, which have to be set manually, which takes quite some - time + automate. Scraping the Update Catalog is not feasible, manually or automated. ## Data mirror @@ -260,6 +258,49 @@ Table of contents: } ``` +... an entry from the underlying MSRC Security Update Guide CSV API + +```json +{ + "productFamily": "Windows", + "productFamilyId": 100000010, + "severity": "Important", + "temporalScore": "6.5", + "product": "Windows 10 Version 1809 for 32-bit Systems", + "productId": 11568, + "releaseDate": "2023-01-10T08:00:00Z", + "impactId": 100000001, + "impact": "Denial of Service", + "issuingCna": "Microsoft", + "platformId": 0, + "baseScore": "7.5", + "kbArticles": [ + { + "rebootRequired": "Yes", + "articleName": "5022286", + "knownIssuesName": "5022286", + "affectedBinaries": [], + "knownIssuesUrl": "https://support.microsoft.com/help/5022286", + "downloadUrl": "https://catalog.update.microsoft.com/v7/site/Search.aspx?q=KB5022286", + "downloadName": "Security Update", + "articleUrl": "https://support.microsoft.com/help/5022286", + "fixedBuildNumber": "10.0.17763.3887", + "supercedence": "5021237", + "ordinal": 0 + } + ], + "initialReleaseDate": "2023-01-10T08:00:00Z", + "cveNumber": "CVE-2023-21527", + "isMariner": false, + "productVersion": "10.0.0", + "architectureId": 0, + "id": "00000000-0000-0000-302d-00006e4d7c04", + "releaseNumber": "2023-Jan", + "severityId": 100000001, + "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H/E:U/RL:O/RC:C" +} +``` + ### Data inconsistencies From 9c732e4a536b4f6a178d4f904d86d184d30fb4ed Mon Sep 17 00:00:00 2001 From: Yan Wittmann Date: Fri, 21 Jul 2023 09:28:30 +0200 Subject: [PATCH 3/3] Added documentation for the new version comparator Signed-off-by: ywittmann --- .../inventory-enrichment-overview.md | 1 + .../other/version-comparator.md | 231 ++++++++++++++++++ 2 files changed, 232 insertions(+) create mode 100644 doc/inventory-enrichment/other/version-comparator.md diff --git a/doc/inventory-enrichment/inventory-enrichment-overview.md b/doc/inventory-enrichment/inventory-enrichment-overview.md index b452b10..56e9cb7 100644 --- a/doc/inventory-enrichment/inventory-enrichment-overview.md +++ b/doc/inventory-enrichment/inventory-enrichment-overview.md @@ -51,6 +51,7 @@ View the overview page of the [**Inventory Enrichment**](enrichment/inventory-en - [Architecture: Superclasses](enrichment/java-super-classes.md) (useful background information for understanding the process) - [Inventory CPE data and effective CPE](enrichment/parsing-effective-cpe.md) +- [Version comparator](other/version-comparator.md) ## Overview graphs diff --git a/doc/inventory-enrichment/other/version-comparator.md b/doc/inventory-enrichment/other/version-comparator.md new file mode 100644 index 0000000..7bbda7b --- /dev/null +++ b/doc/inventory-enrichment/other/version-comparator.md @@ -0,0 +1,231 @@ +> [Vulnerability Monitoring](../inventory-enrichment-overview.md) > Version Comparator + +# Version Comparator + + + +* [Version Comparator](#version-comparator) + * [Introduction](#introduction) + * [Concept](#concept) + * [Tokenizer](#tokenizer) + * [Categorization](#categorization) + * [Comparing](#comparing) + * [Curation format](#curation-format) + + + +## Introduction + +The world of version numbers is far from uniform. It's a landscape filled with a many different formats, many of which +stray from the simplicity of semantic versions or easily readable structures. This complexity can sometimes pose a +challenge even for humans. Try, for instance, the task of ordering the following versions: + +``` +2.616.rev.20171103 +dh_nvr5464_eng_p_v2.616.0000.0.r.20171102 +2.615 +2.616.0000.0.r.20171101 +``` + +
+Solution + +``` +2.615 +2.616.0000.0.r.20171101 +dh_nvr5464_eng_p_v2.616.0000.0.r.20171102 +2.616.rev.20171103 +``` + +
+ +The ability to compare version numbers is a fundamental requirement in vulnerability correlation. Without the capability +to tell whether a version is greater or lesser than another, it becomes impossible to accurately determine if a +specific vulnerability applies to a given version of a software component. + +In our previous version comparator, we used a straightforward approach: extract the first semantic version found in a +version string and use that for comparison. This method effectively handled most of the cases. However, it had its +limitations, particularly when it came to version strings that included modifiers such as `update` or `beta` or firmware +versions. These elements were largely ignored, leading to inaccuracies in some instances. Recognizing this, we needed to +develop a more robust solution. + +## Concept + +The new version comparator follows many steps to extract relevant parts from a version. I will use the following +versions as examples throughout this process: + +``` +2.0.0ubuntu0 +1.0.0-beta.11 +8.2.2_t1a +7.0U2a-17867351 +1.0.3-0.20200308084313-2adbaa4891b9 +``` + +### Tokenizer + +The only way to being able to detect versions in any format is to separate the individual components the version string +is made up of into multiple tokens, that each represent some part of the version. + +Before doing that, some normalization steps are applied, like removing git hashes from the end of the string: + +``` +1.0.3-0.20200308084313-2adbaa4891b9 +1.0.3-0.20200308084313 +``` + +Then, a basic string tokenization is applied that segments the string at separators (`-`) or character type changes +(letter --> number, ...). Numbers joined by dots (`.`) are joined back together into a single token, however (semantic +versions). + +``` +[2.0.0, ubuntu, 0] +[1.0.0, -, beta, ., 11] +[8.2.2, _, t, 1, a] +[7.0, U, 2, a, -, 17867351] +[1.0.3, -, 0.20200308084313] +``` + +Some more terminology that is used in the following chapters: + +- `version modifiers` are the parts of a version that indicate where a version is currently located in its release + cycle, so: `beta`, `update`, `revision`, ... + These parts can have string/number comparable tokens behind them, which should be joined to the modifier (`update 2`). +- `string/number comparable tokens` are tokens that are either semantic versions, numbers or string sequences with the + length 1 or 2. These are usually the tokens that interest us. +- `separators` simple strings like `.`, `_`, `-`. These are filtered out later. + +Then, some more rules are applied: + +- remove leading string tokens +- detect version modifiers with their extending tokens behind them +- filter out all separators +- join subsequent version modifiers into a single version modifier +- ... (some more) + +And we reach this much better readable format: + +``` +[2.0.0, ubuntu, 0] +[1.0.0, beta 11] +[8.2.2, trial 1 a] +[7.0, update 2 a, 17867351] +[1.0.3, 0.20200308084313] +``` + +This is all the tokenizer does, the rest will depend on the implementation of the version class. There is only one of +these implementations currently active, so this process will be described next. + +### Categorization + +The `CuratedCategoriesVersionImpl` then takes these token lists and attempts to populate them into six categories. +The order of these categories is significant as it determines the order of comparison during the version comparison +process later. + +- The `specVersion` is only used if there are too many parts in the version and is compared first. +- The `semVersion` category is used for semantic versions, which follow the format MAJOR.MINOR.PATCH (e.g., 2.0.0). + This part will most likely always be populated. +- `buildVersion` +- `otherVersionPart` +- The `versionModifier` category is used for the first version modifier occurring. +- The `afterAllPart` category is used for storing parts after the version modifier to prevent them matching before it. + +1. Initial Filtering: The algorithm starts by filtering out tokens that are unlikely to be relevant for version + comparison. For example, single-character strings at the beginning of the token list are ignored. +2. Version Modifier Detection: The algorithm then looks for version modifier tokens. If a version modifier token is + found and the `versionModifier` category is currently null, the token is assigned to the `versionModifier` category. +3. Token Categorization: The algorithm then iterates over the remaining tokens. If a token is comparable by string + (i.e., it's a semantic version, a number, or a short string sequence), it's assigned to the first null category among + `specVersion`, `semVersion`, `buildVersion`, and `otherVersionPart`, in that order. If the `versionModifier` category + is not null and the `afterAllPart` category is null, the token is assigned to the `afterAllPart` category. +4. Version Part Rearrangement: After all tokens have been assigned, the algorithm checks if any rearrangement of the + version parts is necessary. For example, if the `specVersion` category is null and the `semVersion` category contains + a number or a semantic version, the version parts are shifted back one category. This ensures that the most + significant version information is prioritized. +5. Special Cases: The algorithm also handles several special cases to improve the accuracy of the categorization + process. For example, if the `buildVersion` category contains a long string (8 characters or more) without a dot, + it's moved to the `otherVersionPart` category. If the `semVersion` category contains a 4-part dot-separated version + and the `buildVersion` category is null, the last part of the `semVersion` is moved to the `buildVersion` category. +6. Finalization: Finally, if the `versionModifier` category is still null after all tokens have been processed, it's + assigned a neutral token with a matching value of 0. The categorized version parts are then stored for later use in + version comparison. + +After step 2: + +``` +spec:2.0.0 sem:0 build:null other:null mod:null after-all:null +spec:1.0.0 sem:null build:null other:null mod:beta 11 after-all:null +spec:8.2.2 sem:null build:null other:null mod:trial 1 a after-all:null +spec:7.0 sem:null build:null other:null mod:update 2 a after-all:17867351 +spec:1.0.3 sem:0.20200308084313 build:null other:null mod:null after-all:null +``` + +After step 6: + +``` +spec:null sem:2.0.0 build:0 other:null mod:neutral_token after-all:null +spec:null sem:1.0.0 build:null other:null mod:beta 11 after-all:null +spec:null sem:8.2.2 build:null other:null mod:trial 1 a after-all:null +spec:null sem:7.0 build:null other:null mod:update 2 a after-all:17867351 +spec:null sem:1.0.3 build:0.20200308084313 other:null mod:neutral_token after-all:null +``` + +Using this, we can now compare the versions. + +### Comparing + +The actual comparison is then fairly simple. The algorithm iterates over the categories in order and compares the +version parts in each category. If the version parts are equal, the algorithm moves on to the next category. If the +version parts are not equal, the algorithm returns the result of the comparison. If the algorithm reaches the end of the +categories without finding any differences, the versions are considered equal. + +Version modifiers are compared by their position in the version modifier hierarchy. For example, `alpha` is considered +less than `beta`, which is considered less than `rc`, and so on. To accommodate for the difference between modifiers +that come before and after the neutral version (without modifier), the algorithm assigns a negative value to modifiers +that come before the neutral version and a positive value to modifiers that come after the neutral version. + +## Curation format + +But, as you might expect, not every version is automatically parsable. For example, the version +`dh_nvr5464_eng_p_v2.616.0000.0.r.20171102` from the introduction above contains an unrelated number `5464` before the +relevant parts later. There is no way to know if this number is relevant or not, so it's not possible to parse this +version correctly _automatically_. This is where the curation format comes in. + +This format allows for specifying rules that can be applied to a version before parsing or even completely replacing the +parsing process. The format is a simple YAML file that contains a list of rules. Here are two examples: + +```yaml +# version: dh_nvr5464_eng_p_v3.616.0000.0.r.20171000 +# preprocessor: v3.616.0000.0.r.20171000 +- pattern: .*?dh_nvr__NUMBER___eng_p_(.+) + pattern-flags: i + type: preprocessor + segments: + preprocessor: $1 + +# version: 2018w20.3-162732p +# sem: 2018.20.3 +# build: 162732 +- pattern: .*?(__YEAR__)w(__SEMVER__)-(__NUMBER__).* + pattern-flags: i + segments: + sem: $1.$2 + build: $3 +``` + +Each rule must have a `pattern` and `segments`. The `pattern` is a regular expression that is matched against the +version string. Depending on the `type` of the rule, the `segments` are evaluated differently, but all of them are a +regular expression replacement string. The `pattern-flags` are optional and can be used to specify regular expression +flags. + +Types: + +- `preprocessor`: The `segments` only have one key, `preprocessor`, which is a string that replaces the version string + before parsing. This is useful when only a small part of the version string is actually relevant for comparison, but + the version still comes in different formats, so you just help the algorithm a bit and then leave it to do the rest. +- `full`: This is the default type. The `segments` are evaluated as a regular expression replacement string. The + replacement string can contain references to capture groups in the `pattern` using `$1`, `$2`, etc. You can set all + version fields using `spec`, `sem`, `build`, `other`, `modifier`, and `after-all`. + +These files can be specified in the `curatedVersionFiles` file list property of the `enrich-inventory` goal. There is +also an integrated list of curated versions that is used by default.