Vulnerability Analyzer

The following documentation aims to offer an overview of the implementation of the Vulnerability Analyzer, breaking down the main components and offering details regarding the format of the vulnerability data.

Architecture

architecture

The two main components are the Producer and Consumer. The Producer does the heavy lifting by gathering information from different sources, merging them into a preliminary Vulnerability Object and later enriching the Object with callable level-detail, thanks to the PatchFinder. Each Vulnerability Object is published to a Kafka topic and found on the other hand by the Consumer, which injects the data in the Knowledge base and stores the vulnerability statement in the local file system.

Vulnerability Producer

The tool is designed to run as a standalone process, gathering, enriching and ultimately publishing the information on a Kafka Topic. The code sits in this repository.

Parsers

The ParserManager class contains and handles data inputs from all different parsers implemented. Every Parser (NVDParser, GHParser, ExtraParser and OVALParser) pulls information from (a) differente source(s) and is also capable of retrieving updates from the same source(s). Each Parser Class implements the following Interface:

public interface VulnerabilityParser {
    // Method to retrieve existing vulnerabilities
    HashMap<String, Vulnerability> getVulnerabilities();

    // Method to retrieve updated and new vulnerabilities
    HashMap<String, Vulnerability> getUpdates();
}

The ParserManager first calls getVulnerabilities from each Parser and then aggregates all the information in a commmon format that is passed down the pipeline for more data enrichment. The other method implemented by each Parser is getUpdates, which will be called daily in order to aggregate new information from each source. This makes the process of adding new Parsers from new sources of information easier.

Sources of information

The ParserManager aggregates information from the following sources of information:

Source	License	Frequency of updates
NVD JSON Feed	Public Domain	Every 2 hours
GitHub Advisories	Public Domain	Daily
MSR 2019¹	Public Domain	n/a
MSR 2020²	Public Domain	n/a
Safety DB (by pyup.io)	CC BY-NC-SA 4.0	Monthly
cvedb (by fabric8-analytics)	n/a	Daily
victims-cve-db	CC BY-SA 4.0	n/a
Debian Security Tracker	Public Domain	Daily
SAP project-kb	Public Domain	n/a

Patches

In order to find where specifically the vulnerability lies in a package, patch links allow to retrieve information regarding what was changed in order to patch the vulnerability. Combined with some heuristics, this allows to drill down the specific callables that were patched.

The PatchFarmer receives a list of references contained in each Vulnerability Object and handles each of them in order to figure out if it's possible to extract some patch diffs. The following is a list of the sources of information handled by the class:

GitHub Commits
GitHub Pull Requests
GitHub Issues
GitLab Commits
GitLab Merge Requests
GitLab Issues
BitBucket Commits
BitBucket Pull Requests
BitBucket Issues
Bugzilla bugs
JIRA tickets
Git Trackers Commits
SVN Revisions
Mercurial Revisions
Apache Mailing List

Vulnerability Consumer

The consumer consumes Vulnerability definitions published by the Producer on the Kafka topic and stores the information in the DB.

Where are the vulnerabilities?

Each Vulnerability is stored in the metadata of package_versions and callables table, precisely, the following format will be used:

# metadata #
{
    "vulnerabilities": {
        "CVE-2020-0042": {...},
        "CVE-2020-0043": {...}
    }
}

The individual definition of each vulnerability will also be available through the REST API.

Vulnerability Object Definition

In order to merge all the different sources together, a common difinition of vulnerability has been introduced. Here is a JSON representation of an example from the famous HearthBleed (CVE-2014-0160):

{
    "id": "CVE-2019-11777",
    "description" : "In the Eclipse Paho Java client library version 1.2.0, when connecting to an MQTT server using TLS and setting a host name verifier, the result of that verification is not checked. This could allow one MQTT server to impersonate another and provide the client library with incorrect information.",
    "severity": "MODERATE",
    "scoreCVSS2": 5.0,
    "scoreCVSS3": 7.5,
    "published_date": "2019-09-11",
    "last_modified_date": "2020-06-10",
    "vulnerable_purls": [
                            "pkg:org.eclipse.paho/[email protected]",
                            "pkg:org.eclipse.paho/[email protected]",
                            "pkg:org.eclipse.paho/[email protected]",
                            "pkg:org.eclipse.paho/[email protected]"
                        ],
    "vulnerable_fasten_uris":    [
                            "/org.eclipse.paho.client.mqttv3.internal/SSLNetworkModule.start()%2Fjava.lang%2FVoidType",
                            "/org.eclipse.paho.client.mqttv5.internal/SSLNetworkModule.start()%2Fjava.lang%2FVoidType"
                        ],
    "patch_date": "2018-05-26",
    "references":       [
                            "https://nvd.nist.gov/vuln/detail/CVE-2019-11777",
                            "..."
                        ],
    "patches":          [
                            "https://bugs.eclipse.org/bugs/show_bug.cgi?id=549934",
                            "..."
                        ],
    "exploits":            [
                            "http://www.exploit-db.com/exploits/42",
                            "..."
                        ]
}

Description of fields:

id: Identifies the vulnerability (e.g. CVE-2014-0160, GHSA-3pc2-fm7p-q2vg, pyup.io-34978)

description: Textual description of the vulnerability

severity: One of the following: LOW, MEDIUM, MODERATE, HIGH, CRITICAL

scoreCVSS2: Find more information here

scoreCVSS3: Find more information here

published_date: Date when the vulnerability was published (yyyy-mm-dd)

last_modified_date: Date when the vulnerability has been last modified (yyyy-mm-dd)

vulnerable_purls: Package coordinates of vulnerable packages. Follows purl-spec guidelines

vulberable_fasten_uris: Vulnerable callables. Listed using FASTEN URI format.

patch_date: Date when the vulnerability has been patched (yyyy-mm-dd)

references: List of links to pages and documentation

patches: List of links to patches that fixed the vulnerability

exploits: List of links to exploits. Most of them from exploit-db

References

¹ Ponta, S. E., Plate, H., Sabetta, A., Bezzi, M., & Dangremont, C. (2019). A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). doi:10.1109/msr.2019.00064

² Jiahao Fan, Yi Li, Shaohua Wang and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In MSR ’20: The 17th International Conference on Mining Software Repositories,May 25–26, 2020, MSR, Seoul, South Korea. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3379597.3387501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly