Home

Awesome

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation (CVPR2024) <a href="https://arxiv.org/abs/2403.06247v2.pdf"><img src="https://img.shields.io/badge/arXiv-2403.06247-b31b1b.svg" alt="Paper Badge"/></a>

<p align="center"> <img src=".png" width="1000" alt="" class="img-responsive"> </p>

Abstract ✨

We propose a text-guided variational image generation method to address the challenge of getting clean data for anomaly detection in industrial manufacturing. Our method utilizes text information about the target object, learned from extensive text library documents, to generate non-defective data images resembling the input image. The proposed framework ensures that the generated non-defective images align with anticipated distributions derived from textual and image-based knowledge, ensuring stability and generality. Experimental results demonstrate the effectiveness of our approach, surpassing previous methods even with limited non-defective data. Our approach is validated through generalization tests across four baseline models and three distinct datasets. We present an additional analysis to enhance the effectiveness of anomaly detection models by utilizing the generated images.

News 📢

Requirements

A suitable conda environment named anomalib_new can be created and activated with:

conda env create -f environment.yaml
conda activate anomalib_new

Datasets 📖

[MVTec AD Dataset]

MVTec AD dataset is one of the main benchmarks for anomaly detection, and is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

[MVTec AD LOCO Dataset]

MVTec AD LOCO dataset is one of the main benchmarks for anomaly detection, and is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

[BTAD Dataset]

BTAD dataset is one of the main benchmarks for anomaly detection, and is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Training 📝

Results 🎯

<img src="img/results.png" width="1200"/>

One-shot

ModelAvgCarpetGridLeatherTileWoodBottleCableCapsuleHazelnutMetal NutPillScrewToothbrushTransistorZipper
PatchCoreResNet-180.8390.9360.6491.0000.9930.9891.0000.8990.6360.9170.8880.7200.5380.7870.7180.915
CFlowResNet-180.8420.9680.5380.9460.9650.9900.9910.7770.7360.9640.7960.6100.7960.7110.7370.921
Reverse-distillationWide ResNet-50-20.8310.9940.7971.0000.9940.9960.9800.6500.6740.9990.6270.8120.5400.9200.7040.780
EfficientAdResNet-180.7130.9950.7430.5480.9740.8140.9510.6190.5170.7590.6210.7320.7430.6340.5850.528

Few-shot

ModelAvgCarpetGridLeatherTileWoodBottleCableCapsuleHazelnutMetal NutPillScrewToothbrushTransistorZipper
PatchCoreResNet-180.9140.9640.7231.0000.9980.9951.0000.9560.9300.9930.9910.8890.6660.6950.9890.924
CFlowResNet-180.9020.9580.7770.9910.9830.9790.9990.9250.7760.9950.9210.8800.6170.9390.8650.925
Reverse-distillationWide ResNet-50-20.8420.9940.8510.9990.9800.9980.9860.8460.7471.0000.7990.7980.6360.9420.8400.737
EfficientAdResNet-180.7810.9780.9790.6630.9620.9070.9800.7590.5520.8730.6790.7510.7540.6920.6080.576

Full-shot

ModelAvgCarpetGridLeatherTileWoodBottleCableCapsuleHazelnutMetal NutPillScrewToothbrushTransistorZipper
PatchCoreResNet-180.9780.9670.9431.0000.9990.9971.0000.9880.9721.0001.0000.9250.9610.9571.0000.957
CFlowResNet-180.9370.9520.8410.9860.9820.9881.0000.9570.9160.9990.9920.9120.7490.9160.9120.950
Reverse-distillationWide ResNet-50-20.9330.9950.9951.0000.9960.9970.9870.9390.9441.0000.7510.9540.9570.9660.9710.921
EfficientAdResNet-180.9550.9830.9940.9730.9990.9561.0000.9470.8070.9460.9700.9750.9510.8930.8250.944

Citation 🔍

@inproceedings{lee2024text,
  title={Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation},
  author={Lee, Mingyu and Choi, Jongwon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={26519--26528},
  year={2024}
}