Awesome
Official code and data repository of ADBench: Anomaly Detection Benchmark (NeurIPS 2022). Please star, watch, and fork ADBench for the active updates!
Recent News:
- ADBench has received 600+⭐ in github and released an official Python package📦 for a better user experience! Thank you all for your attention.
Citing ADBench:
Our ADBench benchmark paper is now available on arxiv and NeurIPS Proceedings. If you find this work useful or use some our released datasets, we would appreciate citations to the following paper:
@inproceedings{han2022adbench,
title={ADBench: Anomaly Detection Benchmark},
author={Songqiao Han and Xiyang Hu and Hailiang Huang and Mingqi Jiang and Yue Zhao},
booktitle={Neural Information Processing Systems (NeurIPS)}
year={2022},
}
Who Are We? ✨
ADBench is a collaborative effort between researchers at Shanghai University of Finance and Economics (SUFE) and Carnegie Mellon University (CMU). The project is designed and conducted by Minqi Jiang (SUFE) and Yue Zhao (CMU), and Xiyang Hu (CMU) --the author(s) of important anomaly detection libraries, including anomaly detection for tabular (PyOD), time-series (TODS), and graph data (PyGOD). The project is also maintained by Chaochuan Hou (SUFE).
<a href="https://github.com/Minqi824/ADBench/graphs/contributors"> <img src="https://contrib.rocks/image?repo=Minqi824/ADBench" /> </a>Why Do You Need ADBench?
ADBench is (to our best knowledge) the most comprehensive tabular anomaly detection benchmark, where we analyze the performance of 30 anomaly detection algorithms on 57 datasets (where we introduced 10 new datasets). By analyzing both research needs and deployment requirements in industry, ADBench conducts 98,436 experiments with three major angles:
- the effect of supervision (e.g., ground truth labels) by including 14 unsupervised, 7 semi-supervised, and 9 supervised methods;
- algorithm performance under different types of anomalies by simulating the environments with 4 types of anomalies; and
- algorithm robustness and stability under 3 settings of data corruptions.
Key Takeaways in 1 Minute:
- :bangbang: surprisingly none of the benchmarked unsupervised algorithms is statistically better than others, emphasizing the importance of algorithm selection;
- :bangbang: with merely 1% labeled anomalies, most semi-supervised methods can outperform the best unsupervised method, justifying the importance of supervision;
- in controlled environments, we observe that best unsupervised methods for specific types of anomalies are even better than semi- and fully-supervised methods, revealing the necessity of understanding data characteristics;
- semi-supervised methods show potential in achieving robustness in noisy and corrupted data, possibly due to their efficiency in using labels and feature selection;
- :interrobang: and many more can be found in our papers (Section 4)
The Figure below provides an overview of our proposed ADBench (see our paper for details).
How to use ADBench?
We envision three primary usages of ADBench:
- Have better understanding of anomaly detection algorithms: please read our paper for details.
- Conduct future research on anomaly detection: we list 4 important future research questions in the paper--see Section 4 to see some thoughts!
- Access rich algorithm implementation and datasets: see details below for how to use them
- Benchmark your anomaly detection algorithms: see notebook for instruction.
We provide full guidance of ADBench in the notebook.
Installation
pip install adbench
pip install --upgrade adbench
Prerequisite: Downloading datasets in ADBench from the github repo
from adbench.myutils import Utils
utils = Utils() # utility function
# download datasets from the remote github repo
# we recommend jihulab for China mainland user and github otherwise
utils.download_datasets(repo='jihulab')
Quickly implement ADBench for benchmarking AD algorithms.
We present the following example for quickly implementing ADBench in three different Angles illustrated in the paper. Currently, 57 datasets can be used for evaluating 30 algorithms in ADBench, and we encourage to test your customized datasets/algorithms in our ADBench testbed.
Run Entire Experiments of ADBench
from adbench.run import RunPipeline
'''
Params:
suffix: file name suffix;
parallel: running either 'unsupervise', 'semi-supervise', or 'supervise' (AD) algorithms,
corresponding to the Angle I: Availability of Ground Truth Labels (Supervision);
realistic_synthetic_mode: testing on 'local', 'global', 'dependency', and 'cluster' anomalies,
corresponding to the Angle II: Types of Anomalies;
noise type: evaluating algorithms on 'duplicated_anomalies', 'irrelevant_features' and 'label_contamination',
corresponding to the Angle III: Model Robustness with Noisy and Corrupted Data.
'''
# return the results including [params, model_name, metrics, time_fit, time_inference]
# besides, results will be automatically saved in the dataframe and ouputted as csv file in adbench/result folder
pipeline = RunPipeline(suffix='ADBench', parallel='semi-supervise', realistic_synthetic_mode=None, noise_type=None)
results = pipeline.run()
pipeline = RunPipeline(suffix='ADBench', parallel='unsupervise', realistic_synthetic_mode='cluster', noise_type=None)
results = pipeline.run()
pipeline = RunPipeline(suffix='ADBench', parallel='supervise', realistic_synthetic_mode=None, noise_type='irrelevant_features')
results = pipeline.run()
Run Your Customized Algorithms on either ADBench Datasets or Your Customized Dataset
# customized model on ADBench's datasets
from adbench.run import RunPipeline
from adbench.baseline.Customized.run import Customized
# notice that you should specify the corresponding category of your customized AD algorithm
# for example, here we use Logistic Regression as customized clf, which belongs to the supervised algorithm
# for your own algorithm, you can realize the same usage as other baselines by modifying the fit.py, model.py, and run.py files in the adbench/baseline/Customized
pipeline = RunPipeline(suffix='ADBench', parallel='supervise', realistic_synthetic_mode=None, noise_type=None)
results = pipeline.run(clf=Customized)
# customized model on customized dataset
import numpy as np
dataset = {}
dataset['X'] = np.random.randn(1000, 20)
dataset['y'] = np.random.choice([0, 1], 1000)
results = pipeline.run(dataset=dataset, clf=Customized)
See detailed guidance of ADBench in the notebook.
Datasets
ADBench includes 57 datasets, as shown in the following Table.
-
Among them, 47 widely-used real-world datasets are gathered for model evaluation, which cover many application domains, including healthcare (e.g., disease diagnosis), audio and language processing (e.g., speech recognition), image processing (e.g., object identification), finance (e.g., financial fraud detection), etc.
-
we introduce 10 more complex datasets from CV and NLP domains with more samples and richer features in ADBench. Pretrained models are applied to extract data embedding from NLP and CV datasets to access more complex representation. Please see the datasets folder and our paper for detailed information.
-
We organize the above 57 datasets into user-friendly format. All the datasets are named as "number_data.npz" in the datasets folder. For example, one can evaluate AD algorithms on the cardio dataset by the following codes. For multi-class dataset like CIFAR10, additional class numbers should be specified as "number_data_class.npz". Please see the folder for more details.
-
We provide the data processing code for NLP datasets and for CV datasets in Google Colab, where one can quickly reproduce our procedures via the free GPUs. We hope this could be helpful for the AD community.
We have unified all the datasets in .npz format, and you can directly access a dataset by the following script
import numpy as np
data = np.load('adbench/datasets/Classical/6_cardio.npz', allow_pickle=True)
X, y = data['X'], data['y']
Number | Data | # Samples | # Features | # Anomaly | % Anomaly | Category |
---|---|---|---|---|---|---|
1 | ALOI | 49534 | 27 | 1508 | 3.04 | Image |
2 | annthyroid | 7200 | 6 | 534 | 7.42 | Healthcare |
3 | backdoor | 95329 | 196 | 2329 | 2.44 | Network |
4 | breastw | 683 | 9 | 239 | 34.99 | Healthcare |
5 | campaign | 41188 | 62 | 4640 | 11.27 | Finance |
6 | cardio | 1831 | 21 | 176 | 9.61 | Healthcare |
7 | Cardiotocography | 2114 | 21 | 466 | 22.04 | Healthcare |
8 | celeba | 202599 | 39 | 4547 | 2.24 | Image |
9 | census | 299285 | 500 | 18568 | 6.20 | Sociology |
10 | cover | 286048 | 10 | 2747 | 0.96 | Botany |
11 | donors | 619326 | 10 | 36710 | 5.93 | Sociology |
12 | fault | 1941 | 27 | 673 | 34.67 | Physical |
13 | fraud | 284807 | 29 | 492 | 0.17 | Finance |
14 | glass | 214 | 7 | 9 | 4.21 | Forensic |
15 | Hepatitis | 80 | 19 | 13 | 16.25 | Healthcare |
16 | http | 567498 | 3 | 2211 | 0.39 | Web |
17 | InternetAds | 1966 | 1555 | 368 | 18.72 | Image |
18 | Ionosphere | 351 | 32 | 126 | 35.90 | Oryctognosy |
19 | landsat | 6435 | 36 | 1333 | 20.71 | Astronautics |
20 | letter | 1600 | 32 | 100 | 6.25 | Image |
21 | Lymphography | 148 | 18 | 6 | 4.05 | Healthcare |
22 | magic.gamma | 19020 | 10 | 6688 | 35.16 | Physical |
23 | mammography | 11183 | 6 | 260 | 2.32 | Healthcare |
24 | mnist | 7603 | 100 | 700 | 9.21 | Image |
25 | musk | 3062 | 166 | 97 | 3.17 | Chemistry |
26 | optdigits | 5216 | 64 | 150 | 2.88 | Image |
27 | PageBlocks | 5393 | 10 | 510 | 9.46 | Document |
28 | pendigits | 6870 | 16 | 156 | 2.27 | Image |
29 | Pima | 768 | 8 | 268 | 34.90 | Healthcare |
30 | satellite | 6435 | 36 | 2036 | 31.64 | Astronautics |
31 | satimage-2 | 5803 | 36 | 71 | 1.22 | Astronautics |
32 | shuttle | 49097 | 9 | 3511 | 7.15 | Astronautics |
33 | skin | 245057 | 3 | 50859 | 20.75 | Image |
34 | smtp | 95156 | 3 | 30 | 0.03 | Web |
35 | SpamBase | 4207 | 57 | 1679 | 39.91 | Document |
36 | speech | 3686 | 400 | 61 | 1.65 | Linguistics |
37 | Stamps | 340 | 9 | 31 | 9.12 | Document |
38 | thyroid | 3772 | 6 | 93 | 2.47 | Healthcare |
39 | vertebral | 240 | 6 | 30 | 12.50 | Biology |
40 | vowels | 1456 | 12 | 50 | 3.43 | Linguistics |
41 | Waveform | 3443 | 21 | 100 | 2.90 | Physics |
42 | WBC | 223 | 9 | 10 | 4.48 | Healthcare |
43 | WDBC | 367 | 30 | 10 | 2.72 | Healthcare |
44 | Wilt | 4819 | 5 | 257 | 5.33 | Botany |
45 | wine | 129 | 13 | 10 | 7.75 | Chemistry |
46 | WPBC | 198 | 33 | 47 | 23.74 | Healthcare |
47 | yeast | 1484 | 8 | 507 | 34.16 | Biology |
48 | CIFAR10 | 5263 | 512 | 263 | 5.00 | Image |
49 | FashionMNIST | 6315 | 512 | 315 | 5.00 | Image |
50 | MNIST-C | 10000 | 512 | 500 | 5.00 | Image |
51 | MVTec-AD | See Table B2. | Image | |||
52 | SVHN | 5208 | 512 | 260 | 5.00 | Image |
53 | Agnews | 10000 | 768 | 500 | 5.00 | NLP |
54 | Amazon | 10000 | 768 | 500 | 5.00 | NLP |
55 | Imdb | 10000 | 768 | 500 | 5.00 | NLP |
56 | Yelp | 10000 | 768 | 500 | 5.00 | NLP |
57 | 20newsgroups | See Table B3. | NLP |
Algorithms
ADBench can be served as a great complement to the PyOD toolkit, providing additional deep learning anomaly detection algorithms API. Compared to the previous benchmark studies, we have a larger algorithm collection with
- latest unsupervised AD algorithms like DeepSVDD and ECOD;
- SOTA semi-supervised algorithms, including DeepSAD and DevNet;
- latest network architectures like ResNet in computer vision (CV) and Transformer in natural language processing (NLP) domain ---we adapt ResNet and FTTransformer models for tabular AD in the proposed ADBench; and
- ensemble learning methods like LightGBM, XGBoost, and CatBoost. The Figure below shows the algorithms (14 unsupervised, 7 semi-supervised, and 9 supervised algorithms) in ADBench.
For each algorithm, we also introduce its specific implementation in the following Table. The only thing worth noting is that model name should be specified (especially for those models deployed by their corresponding package, e.g., PyOD). The following codes show the example to import AD models. Please see the Table for complete AD models included in ADBench and their import methods.
# Directly import AD algorithms from the existing toolkits like PyOD
from adbench.baseline.PyOD import PYOD
model = PYOD(seed=42, model_name='XGBOD') # initialization
model.fit(X_train, y_train) # fit
score = model.predict_score(X_test) # predict
# Import deep learning AD algorithms from our ADBench
from adbench.baseline.PReNet.run import PReNet
model = PReNet(seed=42)
model.fit(X_train, y_train) # fit
score = model.predict_score(X_test) # predict
Model | Year | Type | DL | Import | Source |
---|---|---|---|---|---|
PCA | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
OCSVM | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
LOF | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
CBLOF | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
COF | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
HBOS | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
KNN | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
SOD | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
COPOD | 2020 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
ECOD | 2022 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
IForest† | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
LODA† | Before 2017 | Unsup | ✗ | from adbench.baseline.PyOD import PYOD | Link |
DeepSVDD | 2018 | Unsup | ✓ | from adbench.baseline.PyOD import PYOD | Link |
DAGMM | 2018 | Unsup | ✓ | from adbench.baseline.DAGMM.run import DAGMM | Link |
GANomaly | 2018 | Semi | ✓ | from adbench.baseline.GANomaly.run import GANomaly | Link |
XGBOD† | 2018 | Semi | ✗ | from adbench.baseline.PyOD import PYOD | Link |
DeepSAD | 2019 | Semi | ✓ | from adbench.baseline.DeepSAD.src.run import DeepSAD | Link |
REPEN | 2018 | Semi | ✓ | from adbench.baseline.REPEN.run import REPEN | Link |
DevNet | 2019 | Semi | ✓ | from adbench.baseline.DevNet.run import DevNet | Link |
PReNet | 2020 | Semi | ✓ | from adbench.baseline.PReNet.run import PReNet | / |
FEAWAD | 2021 | Semi | ✓ | from adbench.baseline.FEAWAD.run import FEAWAD | Link |
NB | Before 2017 | Sup | ✗ | from adbench.baseline.Supervised import supervised | Link |
SVM | Before 2017 | Sup | ✗ | from adbench.baseline.Supervised import supervised | Link |
MLP | Before 2017 | Sup | ✓ | from adbench.baseline.Supervised import supervised | Link |
RF† | Before 2017 | Sup | ✗ | from adbench.baseline.Supervised import supervised | Link |
LGB† | 2017 | Supervised | ✗ | from adbench.baseline.Supervised import supervised | Link |
XGB† | Before 2017 | Sup | ✗ | from adbench.baseline.Supervised import supervised | Link |
CatB† | 2019 | Sup | ✗ | from adbench.baseline.Supervised import supervised | Link |
ResNet | 2019 | Sup | ✓ | from adbench.baseline.FTTransformer.run import FTTransformer | Link |
FTTransformer | 2019 | Sup | ✓ | from adbench.baseline.FTTransformer.run import FTTransformer | Link |
- '†' marks ensembling. This symbol is not included in the model name.
- Un-, semi-, and fully-supervised methods are denoted as unsup, semi and sup, respectively.