From 1f24ad221bd54eee5a715a47ecc811e5eb55dcc4 Mon Sep 17 00:00:00 2001
From: Yan Wittmann <order@yanwittmann.de>
Date: Mon, 10 Jul 2023 14:08:41 +0200
Subject: [PATCH 1/3] AEAA-343: Added documentation for detail levels

Signed-off-by: ywittmann <yan.wittmann@metaeffekt.com>
---
 .../enrichment/vad-detail-levels.md           | 113 ++++++++++++++++++
 .../inventory-enrichment-overview.md          |   1 +
 2 files changed, 114 insertions(+)
 create mode 100644 doc/inventory-enrichment/enrichment/vad-detail-levels.md

diff --git a/doc/inventory-enrichment/enrichment/vad-detail-levels.md b/doc/inventory-enrichment/enrichment/vad-detail-levels.md
new file mode 100644
index 0000000..06b8366
--- /dev/null
+++ b/doc/inventory-enrichment/enrichment/vad-detail-levels.md
@@ -0,0 +1,113 @@
+> [Vulnerability Monitoring](../inventory-enrichment-overview.md) > VAD Detail Levels
+
+# Vulnerability Assessment Dashboard Detail Levels
+
+In some cases, displaying all the information available in a dashboard can lead to the dashboard growing rapidly in
+size, depending on the amount of vulnerabilities the dashboard is generated for.
+
+To at least partially mitigate this, the dashboard can be configured to only display certain information, depending on
+several factors.
+
+## Matchers
+
+Matchers are used to determine whether a certain detail level should be used for a vulnerability. They have the
+following properties:
+
+- `status`: a comma-separated list of vulnerability statuses that the matcher applies to. Matches if the vulnerability
+  has one of the specified statuses.
+- `allCpe`: a comma-separated list of CPEs that the matcher applies to. Matches if the vulnerability has all of the
+  specified CPEs.
+- `anyCpe`: a comma-separated list of CPEs that the matcher applies to. Matches if the vulnerability has at least one of
+  the specified CPEs.
+- `vulnerabilityName`: a comma-separated list of vulnerability names that the matcher applies to. Matches if the
+  vulnerability has at least one of the specified names.
+
+The matcher will only match if all of the specified properties match.
+
+## Detail Levels
+
+The following detail levels are available:
+
+- `timeline`: whether the timeline should be displayed. This attribute is the main reasons the detail levels were
+  introduced, as the timeline can not only take very long to generate, but also take up a lot of space in the dashboard.
+- `references`: whether the vulnerability references should be displayed.
+- `advisoriesGlobal`: whether advisory information should be displayed at all.
+- `advisoriesReferences`: whether references in the advisory information should be displayed.
+- `advisoryByTypes`: what type of advisories should be shown, whilst all others are hidden.
+    - `any` (default): any advisory type.
+    - comma-separated list of advisory types (notice, ...)
+- `advisoryByProviders`: what provider of advisories should be shown, whilst all others are hidden.
+    - `any` (default): any advisory provider.
+    - comma-separated list of advisory providers (CERT-FR, GHSA, ...)
+- `eolDate`: whether the EOL date information should be displayed.
+
+## Default Detail Level
+
+By default, all properties are set to the most detailed level, to ensure that all information is displayed.
+
+## Creating Detail Levels
+
+### Correlation YAML
+
+```yaml
+- affects:
+    Id: linux-kernel
+  append:
+    VAD Detail Level Configurations: |-
+      matcher:
+        status = "in review, insignificant";
+        allCpe = "cpe:/a:linux:linux_kernel, cpe:/o:linux:linux_kernel";
+        anyCpe = "cpe:/a:linux:linux_kernel, cpe:/a:linux_test:linux_kernel";
+        vulnerabilityName = "CVE-2023-35829, CVE-2023-35828, CVE-1999-0431";
+      detail:
+        timeline = "false";
+        advisoriesGlobal = "false";
+      matcher: status = "in review, insignificant"
+      detail: timeline = "false"
+```
+
+The detail level information is appended to artifacts in the `VAD Detail Level Configurations` field. This field
+contains a custom format:
+
+- `matcher`: the matcher information, as described above.
+- `detail`: the detail level information, as described above.
+
+Each of these two sections contains information in the following format: `key = "value";` (a key-value pair), where the
+key is the name of the property, and the value is the value of the property. The value is a string, and must be enclosed
+in double quotes.
+
+The newlines are optional, but make the configuration easier to read. The last semicolon is optional as well.
+
+If multiple detail levels are specified, the individual detail levels must be separated by a newline.
+
+### VAD Configuration
+
+```xml
+
+<detailLevels>
+    <map>
+        <matcher>
+            <status>in review, insignificant</status>
+            <allCpe>cpe:/a:linux:linux_kernel, cpe:/o:linux:linux_kernel</allCpe>
+            <anyCpe>cpe:/a:linux:linux_kernel, cpe:/a:linux_test:linux_kernel</anyCpe>
+            <vulnerabilityName>CVE-2023-35829, CVE-2023-35828, CVE-1999-0431
+            </vulnerabilityName>
+        </matcher>
+        <timeline>false</timeline>
+        <references>true</references>
+        <advisoriesGlobal>true</advisoriesGlobal>
+        <advisoriesReferences>true</advisoriesReferences>
+        <advisoryByTypes>
+            <entry>any</entry>
+        </advisoryByTypes>
+        <advisoryByProviders>
+            <entry>any</entry>
+        </advisoryByProviders>
+        <eolDate>true</eolDate>
+    </map>
+</detailLevels>
+```
+
+The Vulnerability Assessment Dashboard configuration contains a `detailLevels` section, which is a map of detail levels.
+Just like the correlation YAML, each detail level contains a `matcher` section, but the `detail` section is replaced by
+the individual properties.
diff --git a/doc/inventory-enrichment/inventory-enrichment-overview.md b/doc/inventory-enrichment/inventory-enrichment-overview.md
index bf6fe54..b452b10 100644
--- a/doc/inventory-enrichment/inventory-enrichment-overview.md
+++ b/doc/inventory-enrichment/inventory-enrichment-overview.md
@@ -36,6 +36,7 @@ View the overview page of the [**Inventory Enrichment**](enrichment/inventory-en
 - [Vulnerability Status files](enrichment/vulnerability-status.md)
 - [Vulnerability Keywords files](enrichment/vulnerability-keywords.md)
 - [Vulnerability Filter format](enrichment/vulnerability-filter-format.md)
+- [Dashboard Detail Levels format](enrichment/vad-detail-levels.md)
 - [Custom Vulnerability Data Sources](enrichment/custom-vulnerabilities.md)
 
 ### Data Sources Explained

From 7c4467a22d8fb0a59a5cb17c6ff3f628622a59ae Mon Sep 17 00:00:00 2001
From: Yan Wittmann <order@yanwittmann.de>
Date: Thu, 13 Jul 2023 09:27:56 +0200
Subject: [PATCH 2/3] Added partial new documentation for MSRC

Signed-off-by: ywittmann <yan.wittmann@metaeffekt.com>
---
 .../msrc/understanding-data.md                | 59 ++++++++++++++++---
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/doc/inventory-enrichment/msrc/understanding-data.md b/doc/inventory-enrichment/msrc/understanding-data.md
index a8ff53c..c04a780 100644
--- a/doc/inventory-enrichment/msrc/understanding-data.md
+++ b/doc/inventory-enrichment/msrc/understanding-data.md
@@ -75,9 +75,11 @@ Table of contents:
 ### Sources
 
 - [Security Update Guide - Microsoft Security Response Center](https://msrc.microsoft.com/update-guide)
-    - can **manually** be downloaded as multiple csv files, one per year. Time frame has to be set manually.
-    - contains `CVE/ADV ↔︎ KB` relations that are not present in the other sources.
-    - does not contain `KB ↔︎ KB` relations.
+    - ~~can **manually** be downloaded as multiple csv files, one per year. Time frame has to be set manually,~~
+    - can **automatically** be mirrored using the underlying API used by the CSV generator on the website and using our
+      process (see below),
+    - contains `CVE/ADV ↔︎ KB` relations that are not present in the other sources,
+    - does not contain `KB ↔︎ KB` relations,
     - A short how-to can be found [here](performing-csv-download.md).
 - [Microsoft Security Updates API | MSRC](https://api.msrc.microsoft.com/cvrf/v2.0/swagger/index)
     - can **automatically** be mirrored using our process (see below),
@@ -91,8 +93,7 @@ Table of contents:
 
 ### Problems with the data sources
 
-- Only one of the three data sources can be retrieved automatically. The one available for mirroring is the most
-  important one, yet it still misses a lot of information.
+- One of the three data sources cannot be mirrored automatically or manually, meaning some data is missing.
 - The data in between the different sources is inconsistent. Every data source either contains `CVE/ADV → KB`
   or `KB → KB`relations that are missing in the other sources. The Update Catalog knows about this information, but is
   inconsistent in itself.  
@@ -105,10 +106,7 @@ Table of contents:
     - **In the API**: the KB replaces (`4487259` and `4487081`) and is replaced by `4507423`
       ![Example](kb-chain-example-1.png)
 - This means, our data will never be complete unless considering all three data sources, which is impossible to
-  automate. it is reasonable to manually download the csv files from the MSRC, but scraping the Update Catalog is not
-  feasible, manually or automated.
-- The MSRC csv has to be downloaded in time segments of one year, which have to be set manually, which takes quite some
-  time
+  automate. Scraping the Update Catalog is not feasible, manually or automated.
 
 ## Data mirror
 
@@ -260,6 +258,49 @@ Table of contents:
 }
 ```
 
+... an entry from the underlying MSRC Security Update Guide CSV API
+
+```json
+{
+  "productFamily": "Windows",
+  "productFamilyId": 100000010,
+  "severity": "Important",
+  "temporalScore": "6.5",
+  "product": "Windows 10 Version 1809 for 32-bit Systems",
+  "productId": 11568,
+  "releaseDate": "2023-01-10T08:00:00Z",
+  "impactId": 100000001,
+  "impact": "Denial of Service",
+  "issuingCna": "Microsoft",
+  "platformId": 0,
+  "baseScore": "7.5",
+  "kbArticles": [
+    {
+      "rebootRequired": "Yes",
+      "articleName": "5022286",
+      "knownIssuesName": "5022286",
+      "affectedBinaries": [],
+      "knownIssuesUrl": "https://support.microsoft.com/help/5022286",
+      "downloadUrl": "https://catalog.update.microsoft.com/v7/site/Search.aspx?q=KB5022286",
+      "downloadName": "Security Update",
+      "articleUrl": "https://support.microsoft.com/help/5022286",
+      "fixedBuildNumber": "10.0.17763.3887",
+      "supercedence": "5021237",
+      "ordinal": 0
+    }
+  ],
+  "initialReleaseDate": "2023-01-10T08:00:00Z",
+  "cveNumber": "CVE-2023-21527",
+  "isMariner": false,
+  "productVersion": "10.0.0",
+  "architectureId": 0,
+  "id": "00000000-0000-0000-302d-00006e4d7c04",
+  "releaseNumber": "2023-Jan",
+  "severityId": 100000001,
+  "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H/E:U/RL:O/RC:C"
+}
+```
+
 </details>
 
 ### Data inconsistencies

From 9c732e4a536b4f6a178d4f904d86d184d30fb4ed Mon Sep 17 00:00:00 2001
From: Yan Wittmann <order@yanwittmann.de>
Date: Fri, 21 Jul 2023 09:28:30 +0200
Subject: [PATCH 3/3] Added documentation for the new version comparator

Signed-off-by: ywittmann <yan.wittmann@metaeffekt.com>
---
 .../inventory-enrichment-overview.md          |   1 +
 .../other/version-comparator.md               | 231 ++++++++++++++++++
 2 files changed, 232 insertions(+)
 create mode 100644 doc/inventory-enrichment/other/version-comparator.md

diff --git a/doc/inventory-enrichment/inventory-enrichment-overview.md b/doc/inventory-enrichment/inventory-enrichment-overview.md
index b452b10..56e9cb7 100644
--- a/doc/inventory-enrichment/inventory-enrichment-overview.md
+++ b/doc/inventory-enrichment/inventory-enrichment-overview.md
@@ -51,6 +51,7 @@ View the overview page of the [**Inventory Enrichment**](enrichment/inventory-en
 - [Architecture: Superclasses](enrichment/java-super-classes.md) (useful background information for understanding
   the process)
 - [Inventory CPE data and effective CPE](enrichment/parsing-effective-cpe.md)
+- [Version comparator](other/version-comparator.md)
 
 ## Overview graphs
 
diff --git a/doc/inventory-enrichment/other/version-comparator.md b/doc/inventory-enrichment/other/version-comparator.md
new file mode 100644
index 0000000..7bbda7b
--- /dev/null
+++ b/doc/inventory-enrichment/other/version-comparator.md
@@ -0,0 +1,231 @@
+> [Vulnerability Monitoring](../inventory-enrichment-overview.md) > Version Comparator
+
+# Version Comparator
+
+<!-- TOC -->
+
+* [Version Comparator](#version-comparator)
+    * [Introduction](#introduction)
+    * [Concept](#concept)
+        * [Tokenizer](#tokenizer)
+        * [Categorization](#categorization)
+        * [Comparing](#comparing)
+    * [Curation format](#curation-format)
+
+<!-- TOC -->
+
+## Introduction
+
+The world of version numbers is far from uniform. It's a landscape filled with a many different formats, many of which
+stray from the simplicity of semantic versions or easily readable structures. This complexity can sometimes pose a
+challenge even for humans. Try, for instance, the task of ordering the following versions:
+
+```
+2.616.rev.20171103
+dh_nvr5464_eng_p_v2.616.0000.0.r.20171102
+2.615
+2.616.0000.0.r.20171101
+```
+
+<details>
+<summary>Solution</summary>
+
+```
+2.615
+2.616.0000.0.r.20171101
+dh_nvr5464_eng_p_v2.616.0000.0.r.20171102
+2.616.rev.20171103
+```
+
+</details>
+
+The ability to compare version numbers is a fundamental requirement in vulnerability correlation. Without the capability
+to tell whether a version is greater or lesser than another, it becomes impossible to accurately determine if a
+specific vulnerability applies to a given version of a software component.
+
+In our previous version comparator, we used a straightforward approach: extract the first semantic version found in a
+version string and use that for comparison. This method effectively handled most of the cases. However, it had its
+limitations, particularly when it came to version strings that included modifiers such as `update` or `beta` or firmware
+versions. These elements were largely ignored, leading to inaccuracies in some instances. Recognizing this, we needed to
+develop a more robust solution.
+
+## Concept
+
+The new version comparator follows many steps to extract relevant parts from a version. I will use the following
+versions as examples throughout this process:
+
+```
+2.0.0ubuntu0
+1.0.0-beta.11
+8.2.2_t1a
+7.0U2a-17867351
+1.0.3-0.20200308084313-2adbaa4891b9
+```
+
+### Tokenizer
+
+The only way to being able to detect versions in any format is to separate the individual components the version string
+is made up of into multiple tokens, that each represent some part of the version.
+
+Before doing that, some normalization steps are applied, like removing git hashes from the end of the string:
+
+```
+1.0.3-0.20200308084313-2adbaa4891b9
+1.0.3-0.20200308084313
+```
+
+Then, a basic string tokenization is applied that segments the string at separators (`-`) or character type changes
+(letter --> number, ...). Numbers joined by dots (`.`) are joined back together into a single token, however (semantic
+versions).
+
+```
+[2.0.0, ubuntu, 0]
+[1.0.0, -, beta, ., 11]
+[8.2.2, _, t, 1, a]
+[7.0, U, 2, a, -, 17867351]
+[1.0.3, -, 0.20200308084313]
+```
+
+Some more terminology that is used in the following chapters:
+
+- `version modifiers` are the parts of a version that indicate where a version is currently located in its release
+  cycle, so: `beta`, `update`, `revision`, ...  
+  These parts can have string/number comparable tokens behind them, which should be joined to the modifier (`update 2`).
+- `string/number comparable tokens` are tokens that are either semantic versions, numbers or string sequences with the
+  length 1 or 2. These are usually the tokens that interest us.
+- `separators` simple strings like `.`, `_`, `-`. These are filtered out later.
+
+Then, some more rules are applied:
+
+- remove leading string tokens
+- detect version modifiers with their extending tokens behind them
+- filter out all separators
+- join subsequent version modifiers into a single version modifier
+- ... (some more)
+
+And we reach this much better readable format:
+
+```
+[2.0.0, ubuntu, 0]
+[1.0.0, beta 11]
+[8.2.2, trial 1 a]
+[7.0, update 2 a, 17867351]
+[1.0.3, 0.20200308084313]
+```
+
+This is all the tokenizer does, the rest will depend on the implementation of the version class. There is only one of
+these implementations currently active, so this process will be described next.
+
+### Categorization
+
+The `CuratedCategoriesVersionImpl` then takes these token lists and attempts to populate them into six categories.
+The order of these categories is significant as it determines the order of comparison during the version comparison
+process later.
+
+- The `specVersion` is only used if there are too many parts in the version and is compared first.
+- The `semVersion` category is used for semantic versions, which follow the format MAJOR.MINOR.PATCH (e.g., 2.0.0).
+  This part will most likely always be populated.
+- `buildVersion`
+- `otherVersionPart`
+- The `versionModifier` category is used for the first version modifier occurring.
+- The `afterAllPart` category is used for storing parts after the version modifier to prevent them matching before it.
+
+1. Initial Filtering: The algorithm starts by filtering out tokens that are unlikely to be relevant for version
+   comparison. For example, single-character strings at the beginning of the token list are ignored.
+2. Version Modifier Detection: The algorithm then looks for version modifier tokens. If a version modifier token is
+   found and the `versionModifier` category is currently null, the token is assigned to the `versionModifier` category.
+3. Token Categorization: The algorithm then iterates over the remaining tokens. If a token is comparable by string
+   (i.e., it's a semantic version, a number, or a short string sequence), it's assigned to the first null category among
+   `specVersion`, `semVersion`, `buildVersion`, and `otherVersionPart`, in that order. If the `versionModifier` category
+   is not null and the `afterAllPart` category is null, the token is assigned to the `afterAllPart` category.
+4. Version Part Rearrangement: After all tokens have been assigned, the algorithm checks if any rearrangement of the
+   version parts is necessary. For example, if the `specVersion` category is null and the `semVersion` category contains
+   a number or a semantic version, the version parts are shifted back one category. This ensures that the most
+   significant version information is prioritized.
+5. Special Cases: The algorithm also handles several special cases to improve the accuracy of the categorization
+   process. For example, if the `buildVersion` category contains a long string (8 characters or more) without a dot,
+   it's moved to the `otherVersionPart` category. If the `semVersion` category contains a 4-part dot-separated version
+   and the `buildVersion` category is null, the last part of the `semVersion` is moved to the `buildVersion` category.
+6. Finalization: Finally, if the `versionModifier` category is still null after all tokens have been processed, it's
+   assigned a neutral token with a matching value of 0. The categorized version parts are then stored for later use in
+   version comparison.
+
+After step 2:
+
+```
+spec:2.0.0  sem:0                 build:null  other:null  mod:null        after-all:null
+spec:1.0.0  sem:null              build:null  other:null  mod:beta 11     after-all:null
+spec:8.2.2  sem:null              build:null  other:null  mod:trial 1 a   after-all:null
+spec:7.0    sem:null              build:null  other:null  mod:update 2 a  after-all:17867351
+spec:1.0.3  sem:0.20200308084313  build:null  other:null  mod:null        after-all:null
+```
+
+After step 6:
+
+```
+spec:null  sem:2.0.0  build:0                 other:null  mod:neutral_token  after-all:null
+spec:null  sem:1.0.0  build:null              other:null  mod:beta 11        after-all:null
+spec:null  sem:8.2.2  build:null              other:null  mod:trial 1 a      after-all:null
+spec:null  sem:7.0    build:null              other:null  mod:update 2 a     after-all:17867351
+spec:null  sem:1.0.3  build:0.20200308084313  other:null  mod:neutral_token  after-all:null
+```
+
+Using this, we can now compare the versions.
+
+### Comparing
+
+The actual comparison is then fairly simple. The algorithm iterates over the categories in order and compares the
+version parts in each category. If the version parts are equal, the algorithm moves on to the next category. If the
+version parts are not equal, the algorithm returns the result of the comparison. If the algorithm reaches the end of the
+categories without finding any differences, the versions are considered equal.
+
+Version modifiers are compared by their position in the version modifier hierarchy. For example, `alpha` is considered
+less than `beta`, which is considered less than `rc`, and so on. To accommodate for the difference between modifiers
+that come before and after the neutral version (without modifier), the algorithm assigns a negative value to modifiers
+that come before the neutral version and a positive value to modifiers that come after the neutral version.
+
+## Curation format
+
+But, as you might expect, not every version is automatically parsable. For example, the version
+`dh_nvr5464_eng_p_v2.616.0000.0.r.20171102` from the introduction above contains an unrelated number `5464` before the
+relevant parts later. There is no way to know if this number is relevant or not, so it's not possible to parse this
+version correctly _automatically_. This is where the curation format comes in.
+
+This format allows for specifying rules that can be applied to a version before parsing or even completely replacing the
+parsing process. The format is a simple YAML file that contains a list of rules. Here are two examples:
+
+```yaml
+# version: dh_nvr5464_eng_p_v3.616.0000.0.r.20171000
+#   preprocessor: v3.616.0000.0.r.20171000
+- pattern: .*?dh_nvr__NUMBER___eng_p_(.+)
+  pattern-flags: i
+  type: preprocessor
+  segments:
+    preprocessor: $1
+
+# version: 2018w20.3-162732p
+#   sem: 2018.20.3
+#   build: 162732
+- pattern: .*?(__YEAR__)w(__SEMVER__)-(__NUMBER__).*
+  pattern-flags: i
+  segments:
+    sem: $1.$2
+    build: $3
+```
+
+Each rule must have a `pattern` and `segments`. The `pattern` is a regular expression that is matched against the
+version string. Depending on the `type` of the rule, the `segments` are evaluated differently, but all of them are a
+regular expression replacement string. The `pattern-flags` are optional and can be used to specify regular expression
+flags.
+
+Types:
+
+- `preprocessor`: The `segments` only have one key, `preprocessor`, which is a string that replaces the version string
+  before parsing. This is useful when only a small part of the version string is actually relevant for comparison, but
+  the version still comes in different formats, so you just help the algorithm a bit and then leave it to do the rest.
+- `full`: This is the default type. The `segments` are evaluated as a regular expression replacement string. The
+  replacement string can contain references to capture groups in the `pattern` using `$1`, `$2`, etc. You can set all
+  version fields using `spec`, `sem`, `build`, `other`, `modifier`, and `after-all`.
+
+These files can be specified in the `curatedVersionFiles` file list property of the `enrich-inventory` goal. There is
+also an integrated list of curated versions that is used by default.