Skip to content

watarungurunnn/robevalanodetect

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems

This repository collects different unsupervised machine learning algorithms to detect anomalies.

Implemented models

We have implemented the following models. Our implementations of ALAD closely follows the original implementations already available on GitHub.

Dependencies

A complete dependency list is available in requirements.txt. We list here the most important ones:

Installation

Assumes latest version of Anaconda was installed.

$ conda create --name [ENV_NAME] python=3.8
$ conda activate [ENV_NAME]
$ pip install -r requirements.txt

Replace [ENV_NAME] with the name of your environment.

Usage

From the root of the project.

$ python -m src.main 
-m [model_name]
-d [/path/to/dataset/file.{npz,mat}]
--dataset [dataset_name]
--batch-size [batch_size]

Our model contains the following parameters:

  • -m: selected machine learning model (required)
  • -d: path to the dataset (required)
  • --batch-size: size of a training batch (required)
  • --dataset: name of the selected dataset. Choices are Arrhythmia, KDD10, IDS2018, NSLKDD, USBIDS, Thyroid (required).
  • -e: number of training epochs (default=200)
  • --n-runs: number of time the experiment is repeated (default=1)
  • --lr: learning rate used during optimization (default=1e-4)
  • --pct: percentage of the original data to keep (useful for large datasets, default=1.)
  • rho: anomaly ratio within the training set (default=0.)
  • --results-path: path where the results are stored (default="../results")
  • --model-path: path where models will be stored (default="../models")
  • --test-mode: loads models from --model_path and tests them (default=False)
  • --hold_out: Percentage of anomalous data to holdout for possible contamination of the training set (default=0)
  • --rho: Contamination ratio of the training set(default=0)

Please note that datasets must be stored in .npz or .mat files. Use the preprocessing scripts within data_process to generate these files.

Example

To train a DAGMM on the KDD 10 percent dataset with the default parameters described in the original paper:

$ python  -m src.main -m DAGMM -d [/path/to/dataset.npz] --dataset KDD10 --batch-size 1024 --results-path ./results/KDD10 --models-path ./models/KDD10

Replace [/path/to/dataset.npz] with the path to the dataset in a numpy-friendly format.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 74.2%
  • Python 25.8%