Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again

Overview | Installation | Examples

Overview

This repo contains the code to run the experiments in "Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again".

Installation

Set up and activate the Python environment by executing

conda env create -f environment.yml

Make sure to have the latest version of condas.

Datasets

To download all datasets at once, with wget:

bash get_dataset_wget.sh

with curl:

bash get_dataset_curl.sh

Examples

To run the experiments for each dataset, without retrieval, for cpu or mono-gpu:

source ./scripts/cpu/no_retrieval/abalone.sh

where abalone can be replaced by any dataset in the paper.

For distributed training, change the number of GPUs accordingly in ./scripts/distributed/no_retrieval/abalone.sh and run:

--nnodes=$NUMBER_OF_NODE --nproc_per_node=$NUMBER_OF_GPUS_PER_NODE

--mp_nodes $NUMBER_OF_NODE        #number of computing nodes
--mp_gpus $TOTAL_NUMBER_OF_GPUS   #total number of gpus

Similarly, for retrieval-augmented methods, replace no_retrieval in the previous path by the chosen retrieval method in ['knn', 'v-attention', 'attention_bsim', 'attention_bsim_bval']. For abalone and knnretrieval, run the following:

source ./scripts/cpu/knn/abalone.sh

or

source ./scripts/distributed/knn/abalone.sh

Citation

If you use this code for your work, please cite our paper Paper as

@inproceedings{thimonier2024making,
author = {Thimonier, Hugo and Popineau, Fabrice and Rimmel, Arpad and Doan, Bich-Li\^{e}n},
title = {Retrieval Augmented Deep Anomaly Detection for Tabular Data},
year = {2024},
isbn = {9798400704369},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3627673.3679559},
doi = {10.1145/3627673.3679559},
booktitle = {Proceedings of the 33rd ACM International Conference on Information and Knowledge Management},
pages = {2250–2259},
numpages = {10},
keywords = {anomaly detection, deep learning, tabular data},
location = {Boise, ID, USA},
series = {CIKM '24}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
checkpoints		checkpoints
data		data
datasets		datasets
logs		logs
results		results
scripts		scripts
tblogs		tblogs
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
configs.py		configs.py
environment.yml		environment.yml
get_dataset_curl.sh		get_dataset_curl.sh
get_dataset_wget.sh		get_dataset_wget.sh
loss.py		loss.py
mask.py		mask.py
optim.py		optim.py
retrieval.py		retrieval.py
run.py		run.py
torch_dataset.py		torch_dataset.py
train.py		train.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again

Overview

Installation

Datasets

Examples

Citation

About

Releases

Packages

Languages

hugothimonier/Retrieval-Augmented-Deep-Anomaly-Detection-for-Tabular-Data

Folders and files

Latest commit

History

Repository files navigation

Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again

Overview

Installation

Datasets

Examples

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages