Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.99 KB

README.md

File metadata and controls

32 lines (24 loc) · 1.99 KB

cybersecurity

Overview

This project uses Python 3.9.1 and the necessary library is listed in requirements.txt. The code here is used for training and testing using the UGR16 dataset specifically July week5 to August Week4. One will need to change the parameters in the code if another time is interested.

High-Level Workflow

  1. Download the data from the UGR16 dataset
  2. Use clean_csv.py to process the raw CSV file first in case there is any corrupted data
  3. Feel free to split the data into training and test set
  4. Use generate_tensor_64_64_64.py to generate either the general tensor or the port tensor for both training and test set
  5. Use training.py to train the model defined in the model.py. We only have one model type for now.
  6. After the training is done, use one of the options described in the Anomaly Detection section to get detection results. Those results will be stored in one folder.
  7. Finally, open get_report.ipynb to visualize the result.
  8. The anomaly_remove.py will be used if one wants to remove all the detected traffic from the raw traffic

Anomaly Detection

  1. Using a general model with general tensor with anomaly_detection_general.py

  2. Using a port model with port tensor

    This method will need to run anomaly_detection_p1.py first and then run anomaly_detection_p2.py

Clean

clean_csv.py will remove any row with nan as a value in the CSV file. This is used as some rows are corrupted in the UGR16 dataset.

Tensor Generation

We have two types of tensors. One is called a general tensor, the other is called a port tensor. Both can be generated by calling the file generate_tensor_64_64_64.py with different parameters.

Model Training

models are defined under model.py and the training.py will use the model defined here.

Util

The util.py defines many shared functions for computation and also shared parameters for the setting. It is used by most files. Therefore, it is recommended to put all the files inside one directory.