Home

Awesome

✨MuSc (ICLR 2024)✨

This is an official PyTorch implementation for "MuSc : Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images" (MuSc)

Authors: Xurui Li<sup>1*</sup> | Ziming Huang<sup>1*</sup> | Feng Xue<sup>3</sup> | Yu Zhou<sup>1,2</sup>

Institutions: <sup>1</sup>Huazhong University of Science and Technology | <sup>2</sup>Wuhan JingCe Electronic Group Co.,LTD | <sup>3</sup>University of Trento

🧐 Arxiv | OpenReview

📖 Chinese README

<a href='#all_catelogue'>Go to Catalogue</a>

🙈TODO list:

📣Updates:

04/11/2024

  1. The comparisons with the zero/few-shot methods in CVPR 2024 have been added to <a href='#compare_sota'>Compare with SOTA k-shot Methods.</a>
  2. Fixed some bugs in models/backbone/_backbones.py.

03/22/2024

  1. The supported codes for BTAD dataset are provided.
  2. Some codes are modified to support larger batch_size.
  3. Some codes are optimized to obtain faster speeds.
  4. <a href='#results_backbones'>Results of different backbones</a> in MVTec AD, VisA and BTAD datasets are provided.
  5. <a href='#results_datasets'>The detailed results of different datasets</a> are provided.
  6. <a href='#inference_time'>The inference time of different backbones</a> is provided.
  7. <a href='#compare_sota'>The comparisons with SOTA zero/few-shot methods</a> are provided. This table will be updated continuously.
  8. We summarize the <a href='#FAQ'> frequently asked questions </a> from users when using MuSc, and give the answers.
  9. We add README in Chinese.

02/01/2024

Initial commits:

  1. The complete code of our method MuSc in paper is released.
  2. This code is compatible with image encoder (ViT) of CLIP and ViT pre-trained with DINO/DINO_v2.
<span id='compare_sota'/>

🎖️Compare with SOTA k-shot methods <a href='#all_catelogue'>[Go to Catalogue]</a>

We will continuously update the following table to compare our MuSc with the newest zero-shot and few-shot methods. "-" indicates that the authors did not measure this metric in their paper.

MVTec AD

ClassificationSegmentation
MethodsVenueSettingAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
MuSc(ours)ICLR 20240-shot97.897.599.197.362.662.793.8
RegADECCV 20224-shot89.192.494.996.251.748.388.0
GraphCoreICLR 20234-shot92.9--97.4---
WinCLIPCVPR 20230-shot91.892.996.585.131.7-64.6
WinCLIPCVPR 20234-shot95.294.797.396.251.7-88.0
APRIL-GANCVPR Workshop 20230-shot86.190.493.587.643.340.844.0
APRIL-GANCVPR Workshop 20234-shot92.892.896.395.956.954.591.8
FastReconICCV 20234-shot94.2--97.0---
ACRNeurIPS 20230-shot85.891.392.992.544.238.972.7
RegAD+Adversarial LossBMVC 20238-shot91.9--96.9---
PACKDBMVC 20238-shot95.3--97.3---
PromptADWACV 20240-shot90.8--92.136.2-72.8
AnomalyCLIPICLR 20240-shot91.5-96.291.1--81.4
InCTRLCVPR 20248-shot95.3------
MVFA-ADCVPR 20244-shot96.2--96.3---
PromptADCVPR 20244-shot96.6--96.5---

VisA

ClassificationSegmentation
MethodsVenueSettingAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
MuSc(ours)ICLR 20240-shot92.889.593.598.848.845.192.7
WinCLIPCVPR 20230-shot78.179.081.279.614.8-56.8
WinCLIPCVPR 20234-shot87.384.288.897.247.0-87.6
APRIL-GANCVPR Workshop 20230-shot78.078.781.494.232.325.786.8
APRIL-GANCVPR Workshop 20234-shot92.688.494.596.240.032.290.2
PACKDBMVC 20238-shot87.5--97.9---
AnomalyCLIPICLR 20240-shot82.1-85.495.5--87.0
InCTRLCVPR 20248-shot88.7------
PromptADCVPR 20244-shot89.1--97.4---
<span id='all_catelogue'/>

📖Catalogue

<span id='abstract'/>

👇Abstract: <a href='#all_catelogue'>[Back to Catalogue]</a>

This paper studies zero-shot anomaly classification (AC) and segmentation (AS) in industrial vision. We reveal that the abundant normal and abnormal cues implicit in unlabeled test images can be exploited for anomaly determination, which is ignored by prior methods. Our key observation is that for the industrial product images, the normal image patches could find a relatively large number of similar patches in other unlabeled images, while the abnormal ones only have a few similar patches.

We leverage such a discriminative characteristic to design a novel zero-shot AC/AS method by Mutual Scoring (MuSc) of the unlabeled images, which does not need any training or prompts. Specifically, we perform Local Neighborhood Aggregation with Multiple Degrees (LNAMD) to obtain the patch features that are capable of representing anomalies in varying sizes. Then we propose the Mutual Scoring Mechanism (MSM) to leverage the unlabeled test images to assign the anomaly score to each other. Furthermore, we present an optimization approach named Re-scoring with Constrained Image-level Neighborhood (RsCIN) for image-level anomaly classification to suppress the false positives caused by noises in normal images.

The superior performance on the challenging MVTec AD and VisA datasets demonstrates the effectiveness of our approach. Compared with the state-of-the-art zero-shot approaches, MuSc achieves a $\textbf{21.1}$% PRO absolute gain (from 72.7% to 93.8%) on MVTec AD, a $\textbf{19.4}$% pixel-AP gain and a $\textbf{14.7}$% pixel-AUROC gain on VisA. In addition, our zero-shot approach outperforms most of the few-shot approaches and is comparable to some one-class methods.

pipline

😊Compare with other 0-shot methods

Compare_0

😊Compare with other 4-shot methods

Compare_4

<span id='setup'/>

🎯Setup: <a href='#all_catelogue'>[Back to Catalogue]</a>

Environment:

Clone the repository locally:

git clone https://github.com/xrli-U/MuSc.git

Create virtual environment:

conda create --name musc python=3.8
conda activate musc

Install the required packages:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
<span id='datasets'/>

👇Datasets Download: <a href='#all_catelogue'>[Back to Catalogue]</a>

Put all the datasets in ./data folder.

<span id='datatets_mvtec_ad'/>

MVTec AD

data
|---mvtec_anomaly_detection
|-----|-- bottle
|-----|-----|----- ground_truth
|-----|-----|----- test
|-----|-----|----- train
|-----|-- cable
|-----|--- ...
<span id='datatets_visa'/>

VisA

data
|----visa
|-----|-- split_csv
|-----|-----|--- 1cls.csv
|-----|-----|--- ...
|-----|-- candle
|-----|-----|--- Data
|-----|-----|-----|----- Images
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|-----|--------|------ Normal 
|-----|-----|-----|----- Masks
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|--- image_anno.csv
|-----|-- capsules
|-----|--- ...

VisA dataset need to be preprocessed to separate the train set from the test set.

python ./datasets/visa_preprocess.py
<span id='datatets_btad'/>

BTAD

data
|---btad
|-----|--- 01
|-----|-----|----- ground_truth
|-----|-----|----- test
|-----|-----|----- train
|-----|--- 02
|-----|--- ...
<span id='run_musc'/>

💎Run MuSc: <a href='#all_catelogue'>[Back to Catalogue]</a>

We provide two ways to run our code.

python

python examples/musc_main.py

Follow the configuration in ./configs/musc.yaml.

script

sh scripts/musc.sh

The configuration in the script musc.sh takes precedence.

The key arguments of the script are as follows:

<span id='rscin'/>

💎Classification optimization (RsCIN): <a href='#all_catelogue'>[Back to Catalogue]</a>

We provide additional code in ./models/RsCIN_features folder to optimize the classification results of other methods using our RsCIN module. We use ViT-large-14-336 of CLIP to extract the image features of the MVTec AD and VisA datasets and store them in mvtec_ad_cls.dat and visa_cls.dat respectively. We show how to use them in ./models/RsCIN_features/RsCIN.py.

Example

Before using our RsCIN module, move RsCIN.py, mvtec_ad_cls.dat and visa_cls.dat to your project directory.

import numpy as np
from RsCIN import Mobile_RsCIN

classification_results = np.random.rand(83) # the classification results of your method.
dataset_name = 'mvtec_ad' # dataset name
class_name = 'bottle' # category name in the above dataset
optimized_classification_results = Mobile_RsCIN(classification_results, dataset_name=dataset_name, class_name=class_name)

The optimized_classification_results are the anomaly classification scores optimized by our RsCIN module.

Apply to the custom dataset

You can extract the image features of each image in the custom dataset, and store them in the variable cls_tokens. The multiple window sizes in the Multi-window Mask Operation can be adjusted by the value of k_list.

import numpy as np
from RsCIN import Mobile_RsCIN

classification_results = np.random.rand(83) # the classification results of your method.
cls_tokens = np.random.rand(83, 768)  # shape[N, C] the image features, N is the number of images
k_list = [2, 3] # the multiple window sizes in the Multi-window Mask Operation
optimized_classification_results = Mobile_RsCIN(classification_results, k_list=k_list, cls_tokens=cls_tokens)
<span id='results_datasets'/>

🎖️Results of different datasets: <a href='#all_catelogue'>[Back to Catalogue]</a>

All the results are implemented by the default settings in our paper.

MVTec AD

ClassificationSegmentation
CategoryAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
bottle99.9299.2199.9898.4879.1783.0496.10
cable98.9997.3099.4295.7660.9757.7089.62
capsule96.4594.8899.3098.9649.8048.4595.49
carpet99.8899.4499.9699.4573.3376.0597.58
grid98.6696.4999.5498.1643.9438.2493.92
hazelnut99.6198.5599.7999.3873.4173.2892.24
leather100.0100.0100.099.7262.8464.4798.74
metal_nut96.9297.3899.2586.1246.2247.5489.34
pill96.2495.8999.3197.4765.5467.2598.01
screw82.1788.8990.8898.7741.8736.1294.40
tile100.0100.0100.097.9074.7178.9094.64
toothbrush100.0100.0100.099.5370.1967.7995.48
transistor99.4295.0099.1991.3859.2458.4077.21
wood98.5198.3399.5297.2468.6474.7594.50
zipper99.8499.1799.9698.4062.4861.8994.46
mean97.7797.3799.0797.1162.1662.2693.45

VisA

ClassificationSegmentation
CategoryAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
candle96.5591.2696.4599.3639.5628.3697.62
capsules88.6286.4393.7798.7150.8543.9088.20
cashew98.5495.5799.3099.3374.8877.6394.30
chewinggum98.4296.4599.3099.5461.3361.2188.39
fryum98.6497.4499.4399.4358.1350.4394.38
macaroni189.3382.7688.6499.5121.9015.2596.37
macaroni268.0369.9667.3797.1411.063.9188.84
pcb189.2884.3689.8999.5080.4988.3692.76
pcb293.2088.6694.4697.3934.3821.8686.06
pcb393.5286.9293.4898.0540.2341.0392.32
pcb498.4392.8998.4798.7046.3844.7292.66
pipe_fryum98.3496.0499.1699.4067.5667.9097.32
mean92.5789.0693.3198.7148.9045.3892.43

BTAD

ClassificationSegmentation
CategoryAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
0198.7497.9699.5397.4959.7358.7685.05
0290.2395.3898.4195.3658.2055.1668.64
0399.5288.3795.6299.2055.6457.5396.62
mean96.1693.9097.8597.3557.8657.1583.43
<span id='results_backbones'/>

🎖️Results of different backbones: <a href='#all_catelogue'>[Back to Catalogue]</a>

The default backbone (feature extractor) in our paper is ViT-large-14-336 of CLIP. We also provide the supported codes for other image encoder of CLIP, DINO and DINO_v2. For more details, see configs/musc.yaml.

MVTec AD

ClassificationSegmentation
BackbonesPre-trainingimage sizeAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
ViT-B-32CLIP25687.9992.3194.3893.0842.0637.2172.62
ViT-B-32CLIP51289.9192.7295.1295.7353.3252.3383.72
ViT-B-16CLIP25692.7893.9896.5996.2152.4850.2387.00
ViT-B-16CLIP51294.2095.2097.3497.0961.2461.4591.67
ViT-B-16-plus-240CLIP24094.7795.4397.6096.2652.2350.2787.70
ViT-B-16-plus-240CLIP51295.6996.5098.1197.2860.7161.2992.14
ViT-L-14CLIP33696.0696.6598.2597.2459.4158.1091.69
ViT-L-14CLIP51895.9496.3298.3097.4263.0663.6792.92
ViT-L-14-336CLIP33696.4096.4498.3097.0357.5155.4492.18
ViT-L-14-336CLIP51897.7797.3799.0797.1162.1662.2693.45
dino_vitbase16DINO25689.3993.7795.3795.8354.0252.8484.24
dino_vitbase16DINO51294.1196.1397.2697.7862.0763.2092.49
dinov2_vitb14DINO_v233695.6796.8097.9597.7460.2359.4593.84
dinov2_vitb14DINO_v251896.3196.8798.3298.0764.6565.3195.59
dinov2_vitl14DINO_v233696.8497.4598.6898.1761.7761.2194.62
dinov2_vitl14DINO_v251897.0897.1398.8298.3466.1567.3996.16

VisA

ClassificationSegmentation
BackbonesPre-trainingimage sizeAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
ViT-B-32CLIP25672.9576.9077.6889.3025.9320.6850.95
ViT-B-32CLIP51277.8280.2081.0196.0634.7230.2073.08
ViT-B-16CLIP25681.4480.8683.8495.9736.7231.8173.48
ViT-B-16CLIP51286.4884.1288.0597.9842.2137.2985.10
ViT-B-16-plus-240CLIP24082.6281.6185.0596.1137.8433.4372.37
ViT-B-16-plus-240CLIP51286.7284.2289.4197.9543.2737.6883.52
ViT-L-14CLIP33688.3885.2389.7798.3244.6740.4287.80
ViT-L-14CLIP51890.8687.7591.6698.4545.7442.0989.93
ViT-L-14-336CLIP33688.6185.3190.0098.5345.1040.9289.35
ViT-L-14-336CLIP51892.5789.0693.3198.7148.9045.3892.43
dino_vitbase16DINO25678.2180.1281.1195.7436.8132.8470.21
dino_vitbase16DINO51284.1183.5285.9197.7442.8638.2783.00
dinov2_vitb14DINO_v233687.6586.2488.5197.8041.6837.0685.01
dinov2_vitb14DINO_v251890.2587.4890.8698.6645.5641.2391.80
dinov2_vitl14DINO_v233690.1888.4790.5698.3843.8438.7488.38
dinov2_vitl14DINO_v251891.7389.2092.2798.7847.1242.7992.40

BTAD

ClassificationSegmentation
BackbonesPre-trainingimage sizeAUROC-clsF1-max-clsAP-clsAUROC-segmF1-max-segmAP-segmPRO-segm
ViT-B-32CLIP25692.1995.5598.4796.7443.9835.7068.56
ViT-B-32CLIP51293.3194.6198.4097.4152.9448.8069.59
ViT-B-16CLIP25692.4491.0097.3197.4555.2752.1972.68
ViT-B-16CLIP51294.1192.9997.9897.9159.1859.0577.86
ViT-B-16-plus-240CLIP24092.8693.9997.9697.6854.8151.3373.47
ViT-B-16-plus-240CLIP51294.1393.8498.3498.1458.6657.5377.23
ViT-L-14CLIP33692.7493.2197.7197.8456.6055.9477.01
ViT-L-14CLIP51894.8295.2998.5897.7755.5555.4680.62
ViT-L-14-336CLIP33695.1194.4898.5397.4256.7555.2379.63
ViT-L-14-336CLIP51896.1693.9097.8597.3557.8657.1583.43
dino_vitbase16DINO25693.6395.6698.6697.5552.1649.2572.86
dino_vitbase16DINO51292.3892.6697.8197.4453.3253.0274.91
dinov2_vitb14DINO_v233693.6091.6597.1998.0863.2865.3274.35
dinov2_vitb14DINO_v251894.9995.1198.5598.3065.7568.8980.41
dinov2_vitl14DINO_v233694.1592.6497.6198.1963.8666.0376.33
dinov2_vitl14DINO_v251895.6295.4098.7698.4065.8869.9082.47
<span id='inference_time'/>

⌛Inference Time: <a href='#all_catelogue'>[Back to Catalogue]</a>

We show the inference time per image in the table below when using different backbones and image sizes. The default setting for number of images in mutual scoring module is 200, and GPU is NVIDIA RTX 3090.

BackbonesPre-trainingimage sizetimes(ms/image)
ViT-B-32CLIP25648.33
ViT-B-32CLIP51295.74
ViT-B-16CLIP25686.68
ViT-B-16CLIP512450.5
ViT-B-16-plus-240CLIP24085.25
ViT-B-16-plus-240CLIP512506.4
ViT-L-14CLIP336266.0
ViT-L-14CLIP518933.3
ViT-L-14-336CLIP336270.2
ViT-L-14-336CLIP518955.3
dino_vitbase16DINO25685.97
dino_vitbase16DINO512458.5
dinov2_vitb14DINO_v2336209.1
dinov2_vitb14DINO_v2518755.0
dinov2_vitl14DINO_v2336281.4
dinov2_vitl14DINO_v25181015.1
<span id='FAQ'/>

🙋🙋‍♂️Frequently Asked Questions: <a href='#all_catelogue'>[Back to Catalogue]</a>

Q: Why do large areas of high anomaly scores appear on normal images in the visualization?

A: In the visualization, in order to highlight abnormal areas, we adopt a single anomaly map normalization by default. Even if the overall response of the single map is low, a large number of highlighted areas will appear after normalization. Normalization of all the anomaly maps together can be achieved by adding the vis_type parameter to the shell script and setting it as whole_norm, or by modifying the testing->vis_type parameter in the ./configs/musc.yaml.

Q: How to set the appropriate input image resolution ?

A: The image resolution img_resize input into the backbone is generally set to a multiple of the patch size of ViT. The commonly used values are 224, 240, 256, 336, 512 and 518. In the previous section <a href='#results_backbones'>(jump)</a>, we show the two input image resolutions commonly used by different feature extractors for reference. The image resolution can be changed by modifying the 'img_resize' parameter in the shell script, or by modifying the datasets->img_resize parameter in the ./configs/musc.yaml configuration file.

<span id='citation'/>

Citation: <a href='#all_catelogue'>[Back to Catalogue]</a>

@inproceedings{Li2024MuSc,
  title={MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images},
  author={Li, Xurui and Huang, Ziming and Xue, Feng and Zhou, Yu},
  booktitle={International Conference on Learning Representations},
  year={2024}
}
<span id='thanks'/>

Thanks: <a href='#all_catelogue'>[Back to Catalogue]</a>

Our repo is built on PatchCore and APRIL-GAN, thanks their clear and elegant code !

<span id='license'/>

License: <a href='#all_catelogue'>[Back to Catalogue]</a>

MuSc is released under the MIT Licence, and is fully open for academic research and also allow free commercial usage. To apply for a commercial license, please contact yuzhou@hust.edu.cn.