Morpheus Experimental

Morpheus Experimental is a staging/collaboration/experimental area for development. This directory contains prototypes of cybersecurity workflows and pipelines which are still being developed and are in the alpha testing stage.

Prototype contributions from the community are welcome. A prototype should include at minimum a tutorial-style notebook, model file, sample data, training script, inference script, and documentation. More information can be found in the Contributing Guide.

Getting Started

We recommend building morpheus experimental from source in an environment with a modern GPU with at least 16GB of memory.

General Requirements

Pascal architecture 16GB GPU or better
Git LFS
Jupyter

Clone the Repository

MORPHEUS_EXPERIMENTAL_ROOT=$(pwd)/morpheus-experimental
git clone https://github.com/nv-morpheus/morpheus-experimental $MORPHEUS_EXPERIMENTAL_ROOT
cd $MORPHEUS_EXPERIMENTAL_ROOT

Build Morpheus Experimental Container

To assist in building a Morpheus Experimental container, several scripts have been provided in the ./docker directory. To build the "release" container, run the following:

./docker/build_container.sh

This will create an image named nvcr.io/nvidia/morpheus/mor_exp:${MORPHEUS_EXPERIMENTAL_VERSION}-runtime where $MORPHEUS_EXPERIMENTAL_VERSION is replaced by the output of git describe --tags --abbrev=0.

To run the built "release" container, use the following:

./docker/run_container.sh

You can specify different Docker images and tags by passing the script the DOCKER_IMAGE_TAG, and DOCKER_IMAGE_TAG variables respectively. For example, to run version v22.09.00a use the following:

DOCKER_IMAGE_TAG="v22.09.00a-runtime" ./docker/run_container.sh

Prototype Specific Requirements

To get started with a specific prototype additional requirements must be installed into your environment. Each prototype directory contains its own requirements.txt file.

cd ${MORPHEUS_EXPERIMENTAL_ROOT}/${PROTOTYPE}
pip install -r requirements.txt

To run the morpheus pipeline for the prototype follow the instructions for setting up your morpheus environment found in the main morpheus repo

Current Cybersecurity Workflow Prototypes

DGA Detection via AppShield

This model is a convolution neural network model trained to classify URL domains generated by Domain-Generation-Algorithms. Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. Input data comes from AppShield.

Phishing URL Detection via AppShield

This model is a binary classifier to label phishing URLs and non-phishing URLs obtained from host process data. Input data comes from AppShield.

String Resemblance Grouping

This technique syntactically groups system log messages and finds group representatives for data exploration and triage.

Detection of Anomalous authentication using Relational Graph Neural Network (RGCN)

This model shows an application of a graph neural network for anomalous authentication detection in Azure-AD signon heterogeneous graph. An Azure-AD signon dataset includes four types of nodes, authentication, user, device and service application nodes are used for modeling. A relational graph neural network (RGCN)is used to identify anomalous authentications from azure-ad signon input.

Asset Clustering using Windows Event Logs

This model is a clustering algorithm to assign each host present in the dataset to a cluster based on aggregated and derived features from Windows Event Logs of that particular host.

Log Sequence Anomaly Detector

This model is a sequence binary classifier trained with vector representation of log messages. The task is to identify abnormal log sequence of alerts from sequence of normally generated logs.

Industrial Control System (ICS) Cyber Attack Detection

This model is an XGBoost classifier that predicts each event on a power system based on dataset features.

Intrusion Detection System using LODA algorithm

The model is a Loda anomaly detector for detecting an intrusion attack in the form of bots in a network using a netflow dataset.

Cyber Foundation Models

This model is a GPT that generates realistic synthetic raw Azure AD logs.

Repo Structure

Each prototype has its own directory that contains everything belonging to the specific prototype. Directories can include the following subfolders and documentation:

models

Model files for public release (ONNX preferred to pytorch/tensorflow)

datasets

Samples of training data and inference dataset with model output to be used to test training and inference scripts. Links to publicly available datasets are also welcome.

training-tuning

A script and python notebook showing how to train or fine-tune the model. The tutorial-style notebook takes sample training data file as an input and creates a model file. It is reliable and repeatable (ie. set seed values). If variables are used in script (ie. epochs, learning_rate) set defaults to those used to achieve metrics reported in documentation. It includes a requirements.txt file with dependencies and versions used for training.

inference

A non-morpheus pipeline script that contains data loading, preprocessing, model loading, inference, postprocessing, and serialized output file. It uses desired morpheus pipeline variables as input variables to the script (ie. threshold=0.6). It produces a reliable and repeatable output file from the inference dataset. It includes requirements.txt file with dependencies and versions used for non-morpheus inference.

morpheus-pipeline (optional)

All the necessary files for a full Morpheus pipeline of the prototype similar to pipelines found in Morpheus Examples with it's own requirements.txt files.

model documentation

A README.md that contains the following information for each model:

Model/prototype name - Name of the model
Use case - Specific use case the model targets
Version - Version of the model (major.minor.patch)
Model overview - General description
Model architecture - General model architecture
Requirements - text file containing dependencies
Training - Training dataset and paradigm
Training data - description of training data
Training epochs - Number of epochs used during training
Training batch size - Batch size used during training
GPU model - Family of GPU used during training
Training script - How to run training script
Model accuracy - Accuracy of the model when tested (Precision, Recall, F1, etc)
Memory footprint - Memory required by the model
How to use this model - Circumstances where this model is useful
Input data - Typical data that is used as input to the model
Output - Type and format of model output
Out-of-scope use cases - Use cases not envisioned during development
Ethical considerations - Ethical analysis of risks and harms
References - Resources used in model development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Morpheus Experimental

Getting Started

General Requirements

Clone the Repository

Build Morpheus Experimental Container

Prototype Specific Requirements

Current Cybersecurity Workflow Prototypes

DGA Detection via AppShield

Phishing URL Detection via AppShield

String Resemblance Grouping

Detection of Anomalous authentication using Relational Graph Neural Network (RGCN)

Asset Clustering using Windows Event Logs

Log Sequence Anomaly Detector

Industrial Control System (ICS) Cyber Attack Detection

Intrusion Detection System using LODA algorithm

Cyber Foundation Models

Repo Structure

models

datasets

training-tuning

inference

morpheus-pipeline (optional)

model documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Morpheus Experimental

Getting Started

General Requirements

Clone the Repository

Build Morpheus Experimental Container

Prototype Specific Requirements

Current Cybersecurity Workflow Prototypes

Repo Structure

models

datasets

training-tuning

inference

morpheus-pipeline (optional)

model documentation