Name	Name	Last commit message	Last commit date
parent directory ..
model	model
training-tuning-inference	training-tuning-inference
README.md	README.md

Model Overview

Description:

The model is a Lightweight Online Detector of Anomalies (Loda) anomaly detector for intrusion detection use cases. Loda is trained to identify attacks in the form of bots from Netflow data. We used cic_ids2017 benchmark dataset for testing the performance of the model.

References(s):

Sharafaldin, I.,Lashkari, A. H., & Ghorbani, A. A. (2018, January). Toward generating a new intrusion detection dataset and intrusion traffic characterization
Pevny,T. (2016). Loda: Lightweight on-line detector of anomalies. Machine Learning

Model Architecture:

Loda (lightweight online detector of anomalies), an ensemble of 1-D fixed histograms, where each histogram are built using random projection of features. The model is an unsupervised anomaly detector where detection is scored using a negative log-likelihood score.

Architecture Type:

LODA

Network Architecture:

Input

The input is Netflow activity data collected in the form of a tabular format.

Input Parameters:

number_random_cuts = 1000
variance = 0.99

Input Format:

CSV format

Other Properties Related to Output:

None

Output

The Unsupervised anomaly detector produces negative log-likelihood as the anomaly score of each data point. A large score indicates anomalousness of data points

Output Parameters:

None

Output Format:

Software Integration:

Runtime(s):

cupy

Supported Hardware Platform(s):

Ampere/Turing

Supported Operating System(s):

Linux

Model Version(s):

1.0

Training & Evaluation:

Training Dataset:

Link:

CICIDS2017

Properties (Quantity, Dataset Descriptions, Sensor(s)):

The dataset is from Canadian Institute for Cybersecurity (CIC). The CICIDS2017 dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols, and attack (CSV files). Also available is the extracted features definition.

Dataset License:

LICENSE

Evaluation Dataset:

Link:

CICIDS2017

Properties (Quantity, Dataset Descriptions, Sensor(s)):

Subset of CICIDS2017 with only botnet attacks.

Dataset License:

LICENSE

Inference:

Engine:

python/cupy

Test Hardware:

Other

Subcards

Model Card ++ Bias Subcard

What is the gender balance of the model validation data?

Not Applicable

What is the racial/ethnicity balance of the model validation data?

Not Applicable

What is the age balance of the model validation data?

Not Applicable

What is the language balance of the model validation data?

English (100%)

What is the geographic origin language balance of the model validation data?

Not Applicable

What is the educational background balance of the model validation data?

Not Applicable

What is the accent balance of the model validation data?

Not Applicable

Describe measures taken to mitigate against unwanted bias.

Not Applicable

Model Card ++ Explainability Subcard

Name example applications and use cases for this model.

The model is primarily designed for testing purposes and serves as a small pretrained model specifically used to evaluate and validate IDS application.

Fill in the blank for the model technique.

This model is intended for developers that want to build IDS system.

Name who is intended to benefit from this model.

The intended beneficiaries of this model are developers who aim to test the performance and functionality of the IDS pipeline using public netflow datasets. It may not be suitable or provide significant value for real-world IDS.

Describe the model output.

This model outputs anomalous score of netflow activities, with large score indicate as suspicious attack.

List the steps explaining how this model works. (e.g., )

Loda detects anomalies in a dataset by computing the likelihood of data points using an ensemble of one-dimensional histograms. These histograms serve as density estimators by approximating the joint probability of the data using sparse random projections

Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:

Not Applicable

List the technical limitations of the model.

This model requires feature engineered netflow activity data in the format of CICIDS processed dataset format.

What performance metrics were used to affirm the model's performance?

AUC & average precision score

What are the potential known risks to users and stakeholders?

Not Applicable

What training is recommended for developers working with this model? If none, please state "none."

None

Link the relevant end user license agreement

Apache 2.0

Model Card ++ Saftey & Security Subcard

Link the location of the training dataset's repository (if able to share).

CICIDS2017

Is the model used in an application with physical safety impact?

Describe physical safety impact (if present).

None

Was model and dataset assessed for vulnerability for potential form of attack?

Name applications for the model.

Typically used to test identify abnormality out of Netflow activities

Name use case restrictions for the model.

The model is trained in the format of CICIDS dataset schema, the model might not be suitable for other applications.

Has this been verified to have met prescribed quality standards?

Name target quality Key Performance Indicators (KPIs) for which this has been tested.

Not Applicable

Technical robustness and model security validated?

Not Applicable

Is the model and dataset compliant with National Classification Management Society (NCMS)?

Not Applicable

Are there explicit model and dataset restrictions?

Are there access restrictions to systems, model, and data?

Is there a digital signature?

Model Card ++ Privacy Subcard

Generatable or reverse engineerable personally-identifiable information (PII)?

Neither

Was consent obtained for any PII used?

Not Applicable, the data is obtained from simulated lab environment, for more information refer to the source of the dataset at CICIDS2017

Protected classes used to create this model? (The following were used in model the model's training:)

Not applicable

How often is dataset reviewed?

Not applicable, the dataset is fully hosted and maintained by external source, for more information refer to the source of the dataset at CICIDS2017

Is a mechanism in place to honor data

No (data is from external source)

If PII collected for the development of this AI model, was it minimized to only what was required?

Not applicable

Is data in dataset traceable?

Scanned for malware?

Are we able to identify and trace source of dataset?

Yes at (CICIDS2017)

Does data labeling (annotation, metadata) comply with privacy laws?

Not applicable

Is data compliant with data subject requests for data correction or removal, if such a request was made?

Not applicable

Files

ids-detection

Directory actions

More options