Insider Risk Detection in PySpark

Introduction

This repo contains the exploration of anomaly detection for insider risk implemented by Kernel Density Estimation (KDE), MinHash and K-Means. The implementation is based on PySpark-3.1.1 and Google Colab.

We implemented probability-based risk estimation for numerical features by KDE. And we implemented the detection of anomalous email contents by MinHash and K-Means.

Dataset

The Insider Threat Test Dataset, which is provided by the CERT Division, is a collection of synthetic insider threat test datasets that provide both background and malicious actor synthetic data. It contains 1000 users, 17 months long.

For more background on this data, please see the paper, Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data.

Usage

Please download the dataset from CMU kilthub and unzip them. Then put the CSV files into the folder ./data/.
For KDE based method, please open KDE_risk.ipynb and follow the introduction inside.
For Minhash & K-means based method, please open Kmeans_email.ipynb and follow the introduction inside.

Others

Because of the limitation of Colab, we cannot call the customized Spark backend. Therefore, the notebook email_IF.ipynb, which tries to apply the Isolation Forest algorithm, can not work successfully yet.

If you have any ideas, please tell me in Issues, thank you!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
KDE_risk.ipynb		KDE_risk.ipynb
Kmeans_email.ipynb		Kmeans_email.ipynb
LICENSE		LICENSE
README.md		README.md
email_IF.ipynb		email_IF.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insider Risk Detection in PySpark

Introduction

Dataset

Usage

Others

About

Releases

Packages

Languages

License

waittim/Insider-Risk-in-PySpark

Folders and files

Latest commit

History

Repository files navigation

Insider Risk Detection in PySpark

Introduction

Dataset

Usage

Others

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages