Home

Awesome

[NeurlPS 2023] PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection

This repository provides the official PyTorch implementation code, data and models of the following paper:
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection<br> [arXiv] 丨 [Code]

Qiang Zhou* (AIR), Weize Li* (AIR), Lihan Jiang (WHU), Guoliang Wang (AIR)
Guyue Zhou (AIR), Shanghang Zhang (PKU), Hao Zhao (AIR). <br> 1 AIR, Tsinghua University, 2 Wuhan University, 3 Peking University.

Abstract: Object anomaly detection is an important problem in the field of machine vision and has seen remarkable progress recently. However, two significant challenges hinder its research and application. First, existing datasets lack comprehensive visual information from various pose angles. They usually have an unrealistic assumption that the anomaly-free training dataset is pose-aligned, and the testing samples have the same pose as the training data. However, in practice, anomaly can come from different poses and training and test samples may have different poses, calling for the study on pose-agnostic anomaly detection. Second, the absence of a consensus on experimental settings for pose-agnostic anomaly detection leads to unfair comparisons of different methods, hindering the research on pose-agnostic anomaly detection. To address these issues, we introduce Multi-pose Anomaly Detection (MAD) dataset and Pose-agnostic Anomaly Detection (PAD) benchmark, which takes the first step to address the pose-agnostic anomaly detection problem. Specifically, we build MAD using 20 complex-shaped LEGO toys including 4k views with various poses, and high-quality and diverse 3D anomalies in both simulated and real environments. Additionally, we develop the OmniposeAD, trained using MAD, specifically designed for pose-agnostic anomaly detection. Through comprehensive evaluations, we demonstrate the superiority of our dataset and framework. Furthermore, we provide an open-source benchmark library, including dataset and baseline methods that cover 8 anomaly detection paradigms, to facilitate future research and application in this domain.<br>

<p align="center"> <img src="assets/teaser(a).png" width = "90%" /> </p>

The main contributions are summarized as follows:


1. Pose-agnostic Anomaly Detection Setting

The progress of object anomaly detection in industrial vision is significantly impeded by the scarcity of datasets containing high-quality annotated anomaly samples and comprehensive view information about normal objects. Our Pose-agnostic Anomaly Detection (PAD) setting introduced for object anomaly detection and localization tasks.

<p align="center"> <img src="assets/PAD_teaser.png" width = "70%" /> </p>

MVTec has developed a series of widely-used photo-realistic industrial anomaly detection dataset (Note that all screenshots from MVTec.):
However, the objects provided by the MVTec-AD dataset are overly simplistic, as discerning anomalies can be achieved solely from a single view.

<p align="center"> <img src="assets/AD.png" width = "50%" /> </p>

Although the MVTec 3D-AD dataset offers more complex objects, it lacks RGB information from a full range of views, requiring the supplementation of hard-to-capture point cloud data to detect invisible structural anomalies.

<p align="center"> <img src="assets/3dAD.png" width = "50%" /> </p>

The MVTec-LOCO AD dataset provides rich global structural and logical information but is not suitable for fine-grained anomaly detection on individual objects.

<p align="center"> <img src="assets/locoAD.png" width = "50%" /> </p>

GDXray dataset provides grayscale maps obtained through X-ray scans for visual discrimination of structural defects but lacks normal samples and color/texture information.

The MPDD dataset offers multi-angle information about the objects but is limited in dataset sample size and lacks standardized backgrounds in the photos.

Recently, Eyecandies dataset has introduced a substantial collection of synthetic candy views captured under various lighting conditions and provides multimodal object information. However, there remains a significant gap between laboratory-synthesized data and the real or simulated data domain.

<p align="center"> <img src="assets/eyecan.png" width = "50%" /> </p>

To address these issues and enable exploration of the pose-agnostic AD problem, we propose our dataset. As the table shown below, we present a comprehensive comparison between MAD and other representative object anomaly detection datasets.

<p align="center"> <img src="assets/dataset_compare.png" width = "75%" /> </p>

2. MAD: Multi-pose Anomaly Detection Dataset

<p align="center"> <img src="assets/PAD-LOGO.png" width = "85%" /> </p>

We introduce a dataset, the Multi-pose Anomaly Detection (MAD) dataset, which represents the first attempt to evaluate the performance of pose-agnostic anomaly detection. The MAD dataset containing 4,000+ highresolution multi-pose views RGB images with camera/pose information of 20 shape-complexed LEGO animal toys for training, as well as 7,000+ simulation and real-world collected RGB images (without camera/pose information) with pixel-precise ground truth annotations for three types of anomalies in test sets. Note that MAD has been further divided into MAD-Sim and MAD-Real for simulation-to-reality studies to bridge the gap between academic research and the demands of industrial manufacturing.

2.1 Meet ours 20 toys

<p align="center"> <img src="assets/allclass.png" width = "65%" /> </p>

2.2 Defect types and samples

When creating data with defects, our work referred to several common types of defects on the LEGO production line, and selected 'Stains', 'Burrs', and 'Missing' as the main defect categories for the dataset.
Burrs are small, unwanted projections or rough edges that can form on the surface of LEGO bricks or components.
Stains refer to discoloration or marks that appear on the surface of LEGO bricks or components.
Missing parts refer to situations where LEGO bricks or components are not included in the final packaged set as intended.

<p align="center"> <img src="assets/defect_sample.png" width = "60%" /> </p> * Please see more details in our supplementary materials.

2.3 MAD-Simulated Dataset

We obtained a collection of open-source LEGO models by visiting the LEGO community. These models were constructed using parts from the Ldraw library, which is a basic LEGO parts library, and they showcased various small animal figures. In order to meet the requirements of the experiment, the author made precise adjustments and optimizations to the models' details, such as edges and colors.

To generate the necessary data, we utilized Blender software and imported the required Ldraw parts. They then adjusted the angles and lighting of the models to achieve the best visual effects. To ensure a more comprehensive 3D dataset, the author employed a 360-degree surround camera technique to render the models from multiple angles.

For camera placement, the author used a circular surface centered on the vertex of the Z-axis as a reference. They positioned a camera every 15 degrees along this circular surface and added cameras at equal intervals along the Z-axis. This setup enabled multiple cameras to render simultaneously, resulting in a richer and more comprehensive dataset with multi-angle model data.

MAD-Sim Dataset with 20 classes (970MB): Google Drive. Due to background/lighting noise in real data and copyright issues, we strongly recommend using MAD-Sim exclusively to explore the PAD problem.

Data Directory

MAD-Sim
 └ 01Gorilla                    ---Object class folder.
   └ train                      ---Training set, fit your model on this data.
     └ good                     ---a set of defect-free training images (*w* full pose information).
       └ 0.png
       └ 1.png
   └ test                       ---a test set of images (*w/o* pose information).
     └ Burrs                    ---with various kinds of defects, such as Burrs.
       └ 0.png
     └ Missing                  ---with various kinds of defects, such as Missing.
     └ Stains                   ---with various kinds of defects, such as Stains.
     └ good                     ---images without defects.
   └ ground_truth               ---GT segmentation mask for various kinds of defects.
     └ Burrs
       └ 0_mask.png
     └ Missing
     └ Stains
   └ transforms.json            ---Provides information related to the camera angle and image transformation matrix for training.
   └ license                    ---Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
   └ readme                     ---More information about this dataset and authorship.
 └ 02Unicorn
    ...

2.4 MAD-Real Dataset

While the content of the MAD-Sim dataset is sufficient to explore object anomaly detection and localization tasks under the pose-agnostic setting, we would like to further assist in verifying whether the models trained using the MAD-Sim dataset are generalizable in the real world by releasing additional MAD-Real datasets.

MAD-Real Dataset with 10 classes(2.19GB): Google Drive. Due to background/lighting noise in real data and copyright issues, we strongly recommend using MAD-Sim exclusively to explore the PAD problem.


3. Pose-agnostic Anomaly Detection and Localization Benchmark on MAD

3.1 Overview of benchmarking methods

The selection criteria for benchmark methods include representativeness, superior performance, and availability of source code. To comprehensively investigate the performance of anomaly detection algorithms in the pose-agnostic anomaly detection setting, we selected 1-2 representative methods from each of the 8 anomaly detection paradigms:
Feature embedding-based methods: <br>

Reconstruction-based methods: <br>

Pseudo-anomaly methods: <br>

3.2 Objects Attributes Quantification

Additionally, we explore a novel aspect in anomaly detection tasks, which is the relationship between object attributes and anomaly detection performance. This investigation leads to unexpected insights and alternative ways to evaluate different approaches. Specifically, we measure the complexity of object shapes and the contrast of object colors, and then analyze the correlation between these properties and detection performance using various methods. The findings reveal that most methods exhibit a positive correlation between performance and color contrast, while a negative correlation is observed with shape complexity, which aligns with our intuition. Notably, Cutpaste, a representative approach that generates anomalies and reconstructs them through a self-supervised task, stands out as being sensitive to color contrast but surprisingly tolerant towards shape complexity. Furthermore, the results demonstrate the robustness of our proposed PAAD to changes in object attributes.

Category12345678910
Shape_complexity65.7957.924.5310081.5811.2546.2373.9755.9723.92
Category11121314151617181920
Shape_complexity42.4343.7729.4928.1856.1049.7615.5362.1879.03
Category12345678910
Color_Contrast73.4535.7149.2241.8435.6971.3874.5935.4140.9854.35
Category11121314151617181920
Color_Contrast52.3486.3266.1555.1032.8561.2836.1858.1037.7535.98

3.3 Evaluation Metric

Following previous work, we specifically choose the Area Under the Receiver Operating Caracteristic Curve (AUROC) as the primary metric for evaluating the performance of anomaly segmentation at the pixel-level and anomaly classification at the image-level. While there exist various evaluation metrics for these tasks, AUROC stands out as the most widely used and suitable metric for conducting comprehensive benchmarking. The AUROC score can be calculated as follows:

$$ AUROC = \int (TPR) dFPR $$

Here, TPR and FPR represent the pixel/image-level true positive rate and false positive rate, respectively.

3.4 [Quantatively Results]Anomaly Detection and Localization Results (Pixel/Image-AUROC)

CategoryFeature Embedding-basedReconstruction-basedOurs
PatchcoreSTFPMFastflowCFlowCFACutpasteDREAMFAVAEOCRGANUniADOmniposeAD
Gorilla88.4/66.893.8/65.391.4/51.194.7/69.291.4/41.836.1/-77.7/58.992.1/46.894.2/-93.4/56.699.5/93.6
Unicorn58.9/92.489.3/79.677.9/45.089.9/82.385.2/85.669.6/-26.0/70.488.0/68.386.7/-86.8/73.098.2/94.0
Mallard66.1/59.386.0/42.285.0/72.187.3/74.983.7/36.640.9/-47.8/34.585.3/33.688.9/-85.4/70.097.4/84.7
Turtle77.5/87.091.0/64.483.9/67.790.2/51.088.7/58.377.2/-45.3/18.489.9/82.876.7/-88.9/50.299.1/95.6
Whale60.9/86.088.6/64.186.5/53.289.2/57.087.9/77.766.8/-55.9/65.890.1/62.589.4/-90.7/75.598.3/82.5
Bird88.6/82.990.6/52.490.4/76.591.8/75.692.2/78.471.7/-60.3/69.191.6/73.399.1/-91.1/74.795.7/92.4
Owl86.3/72.991.8/72.790.7/58.294.6/76.593.9/74.051.9/-78.9/67.296.7/62.590.1/-92.8/65.399.4/88.2
Sabertooth69.4/76.689.3/56.088.7/70.593.3/71.388.0/64.271.2/-26.2/68.694.5/82.491.7/-90.3/61.298.5/95.7
Swan73.5/75.290.8/53.689.5/63.993.1/67.495.0/66.757.2/-75.9/59.787.4/50.672.2/-90.6/57.598.8/86.5
Sheep79.9/89.493.2/56.591.0/71.494.3/80.994.1/86.567.2/-70.5/59.594.3/74.998.9/-92.9/70.497.7/90.1
Pig83.5/85.794.2/50.693.6/59.697.1/72.195.6/66.752.3/-65.6/64.492.2/52.593.6/-94.8/54.697.7/88.3
Zalika64.9/68.286.2/53.784.6/54.989.4/66.987.7/52.143.5/-66.6/51.786.4/34.694.4/-86.7/50.599.1/88.2
Pheonix62.4/71.486.1/56.785.7/53.487.3/64.487.0/65.953.1/-38.7/53.192.4/65.286.8/-84.7/55.499.4/82.3
Elephant56.2/78.676.8/61.776.8/61.672.4/70.177.8/71.756.9/-55.9/62.572.0/49.191.7/-70.7/59.399.0/92.5
Parrot70.7/78.084.0/61.184.0/53.486.8/67.983.7/69.855.4/-34.4/62.387.7/46.166.5/-85.6/53.499.5/97.0
Cat85.6/78.793.7/52.293.7/51.394.7/65.895.0/68.258.3/-79.4/61.394.0/53.291.3/-93.8/53.197.7/84.9
Scorpion79.9/82.190.7/68.974.3/51.991.9/79.592.2/91.471.2/-79.7/83.788.4/66.997.6/-92.2/69.595.9/91.5
Obesobeso91.9/89.594.2/60.892.9/67.695.8/80.096.2/80.673.3/-89.2/73.992.7/58.298.5/-93.6/67.798.0/97.1
Bear79.5/84.290.6/60.785.0/72.992.2/81.490.7/78.768.8/-39.2/76.190.1/52.883.1/-90.9/65.199.3/98.8
Puppy73.3/65.684.9/56.780.3/59.589.6/71.482.3/53.743.2/-45.8/57.485.6/43.578.9/-87.1/55.698.8/93.5
Mean74.7/78.589.3/59.586.1/60.890.8/71.389.8/68.259.3/-58.0/60.989.4/58.088.5/-89.1/62.297.8/90.9

3.5 [Qualitatively Results]Anomaly Localization Results

<p align="center"> <img src="assets/qualitatively_results11.png" width = "80%" /> </p>

3.6 Objects Attributes-Performance Correlation

Note that the X-axis indicates object attributes and the Y-axis indicates anomaly detection (localization) performance.

<p align="center"> <img src="assets/color+shape.png" width = "85%" /> </p>

3.6 In-the-wild AD (OmniposeAD) results.

<p align="center"> <img src="assets/inthewild.png" width = "85%" /> </p>

4. OmniposeAD

The OmniposeAD consists of an anomaly-free neural radiance field, coarse-to-fine pose estimation module, and anomaly detection and localization module. The input is an query image w/o pose. Initially, the image undergoes the coarse-to-fine pose estimation module to obtain the accurate camera view pose. Subsequently, the estimated pose is utilized in the neural radiance field for rendering the normal reference. Finally, the reference is compared to the input query image to extract the anomaly information.

<p align="center"> <img src="assets/pad_pipeline_final.png" width="85%" /> </p>

4.1 Installation

To start, I recommend to create an environment using conda:

conda create -n pad python=3.8
conda activate pad

Clone the repository and install dependencies:

git clone https://github.com/EricLee0224/PAD.git
cd PAD
pip install -r requirements.txt

And then download the ckpts and retrieval model and put them in corresponding file location, also you can run these code:

gdown https://drive.google.com/uc\?id\=1MVokoxPQ9CVo0rSxvdRvTj0bqSp-kXze
unzip ckpts.zip

cd retrieval
gdown https://drive.google.com/uc\?id\=16FOwaqQE0NGY-1EpfoNlU0cGlHjATV0V
unzip model.zip

the file format is like this:

ckpts
 └ LEGO-3D
  └ ...
retrieval
 └ model

4.2 Train

First, you should download our MAD dataset and put the downloaded folder in the "data/LEGO-3D" folder

data 
 └ LEGO-3D

To run the algorithm on 01Gorilla object:

python anomaly_nerf_lego.py --config configs/LEGO-3D/01Gorilla.txt --class_name 01Gorilla

All other parameters such as batch size, class_name, dataset_type you can adjust in corresponding config files.

All NeRF models were trained using this code https://github.com/yenchenlin/nerf-pytorch/.

And iNeRF using the code https://github.com/salykovaa/inerf

4.3 Test

The test script requires the --obj arguments

python auroc_test.py --obj 01Gorilla

License

MAD Dataset is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. You are free to use, copy, and redistribute the material for non-commercial purposes provided you give appropriate credit, provide a link to the license, and indicate if changes were made. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. You may not use the material for commercial purposes.

Our code for OmniposeAD is open-source under an MIT License.


Contact Us

If you have any problem with our work, please feel free to contact the contributors:
[MAD Dataset] bamboosdu@gmail.com, liweize0224@gmail.com and wanggl199705@gmail.com
[OmniposeAD Code] mr.lhjiang@gmail.com


Citation

@Article{zhou2023pad,
  author       = {Zhou, Qiang and Li, Weize and Jiang, Lihan and Wang, Guoliang and Zhou, Guyue and Zhang, Shanghang and Zhao, Hao},
  title        = {PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection},
  journal      = {arXiv preprint arXiv:2310.07716},
  year         = {2023}