Skip to content

Latest commit

 

History

History
49 lines (39 loc) · 3.93 KB

README.md

File metadata and controls

49 lines (39 loc) · 3.93 KB

QCAD: Explainable Contextual Anomaly Detection using Quantile Regression Forests

Paper accepted by DAMI (Data Mining and Knowledge Discovery journal) for publication (July 2023), and this paper is available via this link.

DOI

Repo Structure

the QCAD repo includes two folders, Code and Data.

Code

Specifcially, the Code folder contains the following sub-folders:

  • Implementation: which includes the implementations of contextual anomaly detection algorithms and traditional anomaly detection algorithms as follows:
  • Utilities: which contains some utility function/scripts as follows:
    • SynDataGen.py: generate synthetic datasets.
    • ContextualAnomalyInject.py: inject contextual anomalies.
    • FindMB.R: find Markov Blankets for the LoPAD algorithm.
  • Examples: which contains the following scripts used to generate examples in our paper.
    • ExampleFootball.py: generate the football application example in the Experiment Results section;
    • ExampleQuantileHeight.py: generate the figures in the Introduction section;
    • ExampleBeanPlot.py: generate the Beanplot in the Method section;
  • MultipleRunningAverage: which run all involved detection algothms 10 times independently.
    • AverageTest.py: execute all anomaly detection algorithms except CAD on 20 real-world datasets 10 times, respectively.
    • AverageTestCAD.py: execute CAD separately on 20 real-world datasets 10 times, respectively. This is because it takes a long time.
    • SynAverageTest.py: execute all anomaly detection algorithms except CAD on 10 synthetic datasets 10 times, respectively.
    • SynAverageTestCAD.py: execute CAD separately on 10 synthetic datasets 10 times, respectively. This is because it takes a long time.
  • AblationStuides: which investigate the impacts of different components on detection performance.
    • AblationStudy.py: conduct two ablation stuides.
  • RuntimeAnalysis: which inspects the computational cost of QCAD and CAD.
    • RuntimeAnalysis.py: inspect the running time by varying the number of behaviroual features, contextual features or samples, respectively.
  • SensitivityStudies: which investigate the impact of parameter k.
    • SensitivityOfNeighbours.py: inspect the detection accuracy in terms of RUC AUC, PR AUC, P@n by varying the number of neighbours.

Data

Specifcially, the Data folder contains the following sub-folders:

  • RawData: 20 real-world datasets without contextual anomalies (assumption)
  • SynData: 10 synthetic datasets without contextual anomalies
  • GenData: 20 real-world datasets with injected contextual anomalies, 10 synthetic datasets with contextual anomalies, and the Markov Blankets of these 30 datasets in the subfolder ~/MB/
  • Examples: the football dataset with unkown real-world contextual anomalies
  • TempFiles: temporary or intermediate results