This project uses Python 3.9.1 and the necessary library is listed in requirements.txt. The code here is used for training and testing using the UGR16 dataset specifically July week5 to August Week4. One will need to change the parameters in the code if another time is interested.
- Download the data from the UGR16 dataset
- Use
clean_csv.py
to process the raw CSV file first in case there is any corrupted data - Feel free to split the data into training and test set
- Use
generate_tensor_64_64_64.py
to generate either the general tensor or the port tensor for both training and test set - Use
training.py
to train the model defined in themodel.py
. We only have one model type for now. - After the training is done, use one of the options described in the Anomaly Detection section to get detection results. Those results will be stored in one folder.
- Finally, open
get_report.ipynb
to visualize the result. - The
anomaly_remove.py
will be used if one wants to remove all the detected traffic from the raw traffic
-
Using a general model with general tensor with
anomaly_detection_general.py
-
Using a port model with port tensor
This method will need to run
anomaly_detection_p1.py
first and then runanomaly_detection_p2.py
clean_csv.py
will remove any row with nan as a value in the CSV file. This is used as some rows are corrupted in the UGR16 dataset.
We have two types of tensors. One is called a general tensor, the other is called a port tensor. Both can be generated by calling the file generate_tensor_64_64_64.py
with different parameters.
models are defined under model.py
and the training.py
will use the model defined here.
The util.py
defines many shared functions for computation and also shared parameters for the setting. It is used by most files. Therefore, it is recommended to put all the files inside one directory.