This chapter revisits anomaly detection on login attempt data, using machine learning techniques, all while giving you a taste of how the workflow looks in practice.
We will work through five notebooks emulating a real-world scenario using the simulated data in the logs/
directory. Here's a breakdown of the files included in this chapter:
logs/
: Directory containing all simulated log files for the analysisuser_data/
: Directory containing information on the user base used for the simulation (for thesimulate.py
script to use)0-simulating_the_data.ipynb
: Jupyter notebook showing how data was simulated1-EDA_unlabeled_data.ipynb
: Jupyter notebook used to perform our EDA of the unlabeled data2-unsupervised_anomaly_detection.ipynb
: Jupyter notebook used to test out some unsupervised anomaly detection alogrithms3-EDA_labeled_data.ipynb
: Jupyter notebook used to perform our EDA of the labeled data4-supervised_anomaly_detection.ipynb
: Jupyter notebook used to build and evaluate supervised anomaly detection models5-online_learning.ipynb
: Jupyter notebook used to implement an online learning classifiermerge_logs.py
: Python script for merging the logs of individually simulated monthsrun_simulations.sh
: Bash script for simulating and merging the log files (this is used to generate the data)simulate.py
: Python script for simulating the data using thelogin_attempt_simulator
package
The end-of-chapter exercises will use the data in the logs/
directory to explore additional algorithms for machine learning anomaly detection; solutions to these exercises can be found in the repository's solutions/ch_11/
directory.