Awesome

✨MuSc (ICLR 2024)✨

This is an official PyTorch implementation for "MuSc : Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images" (MuSc)

Authors: Xurui Li1* | Ziming Huang1* | Feng Xue3 | Yu Zhou1,2

Institutions: 1Huazhong University of Science and Technology | 2Wuhan JingCe Electronic Group Co.,LTD | 3University of Trento

🧐 Arxiv | OpenReview

📖 Chinese README

<a href='#all_catelogue'>Go to Catalogue</a>

🙈TODO list:

⬜️ Using some strategies to reduce the inference time per image from 955.3ms to 249.8ms.
⬜️ Compatibility with more industrial datasets.
⬜️ Compatibility with more visual backbones, e.g. Vision Mamba.

📣Updates:

04/11/2024

The comparisons with the zero/few-shot methods in CVPR 2024 have been added to <a href='#compare_sota'>Compare with SOTA k-shot Methods.</a>
Fixed some bugs in models/backbone/_backbones.py.

03/22/2024

The supported codes for BTAD dataset are provided.
Some codes are modified to support larger batch_size.
Some codes are optimized to obtain faster speeds.
<a href='#results_backbones'>Results of different backbones</a> in MVTec AD, VisA and BTAD datasets are provided.
<a href='#results_datasets'>The detailed results of different datasets</a> are provided.
<a href='#inference_time'>The inference time of different backbones</a> is provided.
<a href='#compare_sota'>The comparisons with SOTA zero/few-shot methods</a> are provided. This table will be updated continuously.
We summarize the <a href='#FAQ'> frequently asked questions </a> from users when using MuSc, and give the answers.
We add README in Chinese.

02/01/2024

Initial commits:

The complete code of our method MuSc in paper is released.
This code is compatible with image encoder (ViT) of CLIP and ViT pre-trained with DINO/DINO_v2.

🎖️Compare with SOTA k-shot methods <a href='#all_catelogue'>[Go to Catalogue]</a>

We will continuously update the following table to compare our MuSc with the newest zero-shot and few-shot methods. "-" indicates that the authors did not measure this metric in their paper.

MVTec AD

			Classification			Segmentation
Methods	Venue	Setting	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
MuSc(ours)	ICLR 2024	0-shot	97.8	97.5	99.1	97.3	62.6	62.7	93.8
RegAD	ECCV 2022	4-shot	89.1	92.4	94.9	96.2	51.7	48.3	88.0
GraphCore	ICLR 2023	4-shot	92.9	-	-	97.4	-	-	-
WinCLIP	CVPR 2023	0-shot	91.8	92.9	96.5	85.1	31.7	-	64.6
WinCLIP	CVPR 2023	4-shot	95.2	94.7	97.3	96.2	51.7	-	88.0
APRIL-GAN	CVPR Workshop 2023	0-shot	86.1	90.4	93.5	87.6	43.3	40.8	44.0
APRIL-GAN	CVPR Workshop 2023	4-shot	92.8	92.8	96.3	95.9	56.9	54.5	91.8
FastRecon	ICCV 2023	4-shot	94.2	-	-	97.0	-	-	-
ACR	NeurIPS 2023	0-shot	85.8	91.3	92.9	92.5	44.2	38.9	72.7
RegAD+Adversarial Loss	BMVC 2023	8-shot	91.9	-	-	96.9	-	-	-
PACKD	BMVC 2023	8-shot	95.3	-	-	97.3	-	-	-
PromptAD	WACV 2024	0-shot	90.8	-	-	92.1	36.2	-	72.8
AnomalyCLIP	ICLR 2024	0-shot	91.5	-	96.2	91.1	-	-	81.4
InCTRL	CVPR 2024	8-shot	95.3	-	-	-	-	-	-
MVFA-AD	CVPR 2024	4-shot	96.2	-	-	96.3	-	-	-
PromptAD	CVPR 2024	4-shot	96.6	-	-	96.5	-	-	-

VisA

			Classification			Segmentation
Methods	Venue	Setting	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
MuSc(ours)	ICLR 2024	0-shot	92.8	89.5	93.5	98.8	48.8	45.1	92.7
WinCLIP	CVPR 2023	0-shot	78.1	79.0	81.2	79.6	14.8	-	56.8
WinCLIP	CVPR 2023	4-shot	87.3	84.2	88.8	97.2	47.0	-	87.6
APRIL-GAN	CVPR Workshop 2023	0-shot	78.0	78.7	81.4	94.2	32.3	25.7	86.8
APRIL-GAN	CVPR Workshop 2023	4-shot	92.6	88.4	94.5	96.2	40.0	32.2	90.2
PACKD	BMVC 2023	8-shot	87.5	-	-	97.9	-	-	-
AnomalyCLIP	ICLR 2024	0-shot	82.1	-	85.4	95.5	-	-	87.0
InCTRL	CVPR 2024	8-shot	88.7	-	-	-	-	-	-
PromptAD	CVPR 2024	4-shot	89.1	-	-	97.4	-	-	-

📖Catalogue

<a href='#abstract'>1. Abstract</a>
<a href='#setup'>2. Environment setup</a>
<a href='#datasets'>3. Datasets download</a>
- <a href='#datatets_mvtec_ad'>MVTec AD</a>
- <a href='#datatets_visa'>VisA</a>
- <a href='#datatets_btad'>BTAD</a>
<a href='#run_musc'>4. Run MuSc</a>
<a href='#rscin'>5. Run RsCIN</a>
<a href='#results_datasets'>6. Results of different datasets</a>
<a href='#results_backbones'>7. Results of different backbones</a>
<a href='#inference_time'>8. Inference time</a>
<a href='#FAQ'>9. Frequently Asked Questions</a>
<a href='#citation'>10. Citation</a>
<a href='#thanks'>11. Thanks</a>
<a href='#license'>12. License</a>

👇Abstract: <a href='#all_catelogue'>[Back to Catalogue]</a>

This paper studies zero-shot anomaly classification (AC) and segmentation (AS) in industrial vision. We reveal that the abundant normal and abnormal cues implicit in unlabeled test images can be exploited for anomaly determination, which is ignored by prior methods. Our key observation is that for the industrial product images, the normal image patches could find a relatively large number of similar patches in other unlabeled images, while the abnormal ones only have a few similar patches.

We leverage such a discriminative characteristic to design a novel zero-shot AC/AS method by Mutual Scoring (MuSc) of the unlabeled images, which does not need any training or prompts. Specifically, we perform Local Neighborhood Aggregation with Multiple Degrees (LNAMD) to obtain the patch features that are capable of representing anomalies in varying sizes. Then we propose the Mutual Scoring Mechanism (MSM) to leverage the unlabeled test images to assign the anomaly score to each other. Furthermore, we present an optimization approach named Re-scoring with Constrained Image-level Neighborhood (RsCIN) for image-level anomaly classification to suppress the false positives caused by noises in normal images.

The superior performance on the challenging MVTec AD and VisA datasets demonstrates the effectiveness of our approach. Compared with the state-of-the-art zero-shot approaches, MuSc achieves a $\textbf{21.1}$% PRO absolute gain (from 72.7% to 93.8%) on MVTec AD, a $\textbf{19.4}$% pixel-AP gain and a $\textbf{14.7}$% pixel-AUROC gain on VisA. In addition, our zero-shot approach outperforms most of the few-shot approaches and is comparable to some one-class methods.

pipline

😊Compare with other 0-shot methods

Compare_0

😊Compare with other 4-shot methods

Compare_4

🎯Setup: <a href='#all_catelogue'>[Back to Catalogue]</a>

Environment:

Python 3.8
CUDA 11.7
PyTorch 2.0.1

Clone the repository locally:

git clone https://github.com/xrli-U/MuSc.git

Create virtual environment:

conda create --name musc python=3.8
conda activate musc

Install the required packages:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

👇Datasets Download: <a href='#all_catelogue'>[Back to Catalogue]</a>

Put all the datasets in ./data folder.

MVTec AD

data
|---mvtec_anomaly_detection
|-----|-- bottle
|-----|-----|----- ground_truth
|-----|-----|----- test
|-----|-----|----- train
|-----|-- cable
|-----|--- ...

VisA

data
|----visa
|-----|-- split_csv
|-----|-----|--- 1cls.csv
|-----|-----|--- ...
|-----|-- candle
|-----|-----|--- Data
|-----|-----|-----|----- Images
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|-----|--------|------ Normal 
|-----|-----|-----|----- Masks
|-----|-----|-----|--------|------ Anomaly 
|-----|-----|--- image_anno.csv
|-----|-- capsules
|-----|--- ...

VisA dataset need to be preprocessed to separate the train set from the test set.

python ./datasets/visa_preprocess.py

BTAD

data
|---btad
|-----|--- 01
|-----|-----|----- ground_truth
|-----|-----|----- test
|-----|-----|----- train
|-----|--- 02
|-----|--- ...

💎Run MuSc: <a href='#all_catelogue'>[Back to Catalogue]</a>

We provide two ways to run our code.

python

python examples/musc_main.py

Follow the configuration in ./configs/musc.yaml.

script

sh scripts/musc.sh

The configuration in the script musc.sh takes precedence.

The key arguments of the script are as follows:

--device: GPU_id.
--data_path: The directory of datasets.
--dataset_name: Dataset name.
--class_name: Category to be tested. If the parameter is set to ALL, all the categories are tested.
--backbone_name: Feature exractor name. Our code is compatible with CLIP, DINO and DINO_v2. For more details, see configs/musc.yaml.
--pretrained: Pretrained CLIP model. openai, laion400m_e31, and laion400m_e32 are optional.
--feature_layers: The layers for extracting features in backbone(ViT).
--img_resize: The size of the image inputted into the model.
--divide_num: The number of subsets the whole test set is divided into.
--r_list: The aggregation degrees of our LNAMD module.
--output_dir: The directory that saves the anomaly prediction maps and metrics. This directory will be automatically created.
--vis: Whether to save the anomaly prediction maps.
--vis_type: Choose between single_norm and whole_norm. This means whether to normalize a single anomaly map or all of them together when visualizing.
--save_excel: Whether to save anomaly classification and segmentation results (metrics).

💎Classification optimization (RsCIN): <a href='#all_catelogue'>[Back to Catalogue]</a>

We provide additional code in ./models/RsCIN_features folder to optimize the classification results of other methods using our RsCIN module. We use ViT-large-14-336 of CLIP to extract the image features of the MVTec AD and VisA datasets and store them in mvtec_ad_cls.dat and visa_cls.dat respectively. We show how to use them in ./models/RsCIN_features/RsCIN.py.

Example

Before using our RsCIN module, move RsCIN.py, mvtec_ad_cls.dat and visa_cls.dat to your project directory.

import numpy as np
from RsCIN import Mobile_RsCIN

classification_results = np.random.rand(83) # the classification results of your method.
dataset_name = 'mvtec_ad' # dataset name
class_name = 'bottle' # category name in the above dataset
optimized_classification_results = Mobile_RsCIN(classification_results, dataset_name=dataset_name, class_name=class_name)

The optimized_classification_results are the anomaly classification scores optimized by our RsCIN module.

Apply to the custom dataset

You can extract the image features of each image in the custom dataset, and store them in the variable cls_tokens. The multiple window sizes in the Multi-window Mask Operation can be adjusted by the value of k_list.

import numpy as np
from RsCIN import Mobile_RsCIN

classification_results = np.random.rand(83) # the classification results of your method.
cls_tokens = np.random.rand(83, 768)  # shape[N, C] the image features, N is the number of images
k_list = [2, 3] # the multiple window sizes in the Multi-window Mask Operation
optimized_classification_results = Mobile_RsCIN(classification_results, k_list=k_list, cls_tokens=cls_tokens)

🎖️Results of different datasets: <a href='#all_catelogue'>[Back to Catalogue]</a>

All the results are implemented by the default settings in our paper.

MVTec AD

	Classification			Segmentation
Category	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
bottle	99.92	99.21	99.98	98.48	79.17	83.04	96.10
cable	98.99	97.30	99.42	95.76	60.97	57.70	89.62
capsule	96.45	94.88	99.30	98.96	49.80	48.45	95.49
carpet	99.88	99.44	99.96	99.45	73.33	76.05	97.58
grid	98.66	96.49	99.54	98.16	43.94	38.24	93.92
hazelnut	99.61	98.55	99.79	99.38	73.41	73.28	92.24
leather	100.0	100.0	100.0	99.72	62.84	64.47	98.74
metal_nut	96.92	97.38	99.25	86.12	46.22	47.54	89.34
pill	96.24	95.89	99.31	97.47	65.54	67.25	98.01
screw	82.17	88.89	90.88	98.77	41.87	36.12	94.40
tile	100.0	100.0	100.0	97.90	74.71	78.90	94.64
toothbrush	100.0	100.0	100.0	99.53	70.19	67.79	95.48
transistor	99.42	95.00	99.19	91.38	59.24	58.40	77.21
wood	98.51	98.33	99.52	97.24	68.64	74.75	94.50
zipper	99.84	99.17	99.96	98.40	62.48	61.89	94.46
mean	97.77	97.37	99.07	97.11	62.16	62.26	93.45

VisA

	Classification			Segmentation
Category	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
candle	96.55	91.26	96.45	99.36	39.56	28.36	97.62
capsules	88.62	86.43	93.77	98.71	50.85	43.90	88.20
cashew	98.54	95.57	99.30	99.33	74.88	77.63	94.30
chewinggum	98.42	96.45	99.30	99.54	61.33	61.21	88.39
fryum	98.64	97.44	99.43	99.43	58.13	50.43	94.38
macaroni1	89.33	82.76	88.64	99.51	21.90	15.25	96.37
macaroni2	68.03	69.96	67.37	97.14	11.06	3.91	88.84
pcb1	89.28	84.36	89.89	99.50	80.49	88.36	92.76
pcb2	93.20	88.66	94.46	97.39	34.38	21.86	86.06
pcb3	93.52	86.92	93.48	98.05	40.23	41.03	92.32
pcb4	98.43	92.89	98.47	98.70	46.38	44.72	92.66
pipe_fryum	98.34	96.04	99.16	99.40	67.56	67.90	97.32
mean	92.57	89.06	93.31	98.71	48.90	45.38	92.43

BTAD

	Classification			Segmentation
Category	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
01	98.74	97.96	99.53	97.49	59.73	58.76	85.05
02	90.23	95.38	98.41	95.36	58.20	55.16	68.64
03	99.52	88.37	95.62	99.20	55.64	57.53	96.62
mean	96.16	93.90	97.85	97.35	57.86	57.15	83.43

🎖️Results of different backbones: <a href='#all_catelogue'>[Back to Catalogue]</a>

The default backbone (feature extractor) in our paper is ViT-large-14-336 of CLIP. We also provide the supported codes for other image encoder of CLIP, DINO and DINO_v2. For more details, see configs/musc.yaml.

MVTec AD

			Classification			Segmentation
Backbones	Pre-training	image size	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
ViT-B-32	CLIP	256	87.99	92.31	94.38	93.08	42.06	37.21	72.62
ViT-B-32	CLIP	512	89.91	92.72	95.12	95.73	53.32	52.33	83.72
ViT-B-16	CLIP	256	92.78	93.98	96.59	96.21	52.48	50.23	87.00
ViT-B-16	CLIP	512	94.20	95.20	97.34	97.09	61.24	61.45	91.67
ViT-B-16-plus-240	CLIP	240	94.77	95.43	97.60	96.26	52.23	50.27	87.70
ViT-B-16-plus-240	CLIP	512	95.69	96.50	98.11	97.28	60.71	61.29	92.14
ViT-L-14	CLIP	336	96.06	96.65	98.25	97.24	59.41	58.10	91.69
ViT-L-14	CLIP	518	95.94	96.32	98.30	97.42	63.06	63.67	92.92
ViT-L-14-336	CLIP	336	96.40	96.44	98.30	97.03	57.51	55.44	92.18
ViT-L-14-336	CLIP	518	97.77	97.37	99.07	97.11	62.16	62.26	93.45
dino_vitbase16	DINO	256	89.39	93.77	95.37	95.83	54.02	52.84	84.24
dino_vitbase16	DINO	512	94.11	96.13	97.26	97.78	62.07	63.20	92.49
dinov2_vitb14	DINO_v2	336	95.67	96.80	97.95	97.74	60.23	59.45	93.84
dinov2_vitb14	DINO_v2	518	96.31	96.87	98.32	98.07	64.65	65.31	95.59
dinov2_vitl14	DINO_v2	336	96.84	97.45	98.68	98.17	61.77	61.21	94.62
dinov2_vitl14	DINO_v2	518	97.08	97.13	98.82	98.34	66.15	67.39	96.16

VisA

			Classification			Segmentation
Backbones	Pre-training	image size	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
ViT-B-32	CLIP	256	72.95	76.90	77.68	89.30	25.93	20.68	50.95
ViT-B-32	CLIP	512	77.82	80.20	81.01	96.06	34.72	30.20	73.08
ViT-B-16	CLIP	256	81.44	80.86	83.84	95.97	36.72	31.81	73.48
ViT-B-16	CLIP	512	86.48	84.12	88.05	97.98	42.21	37.29	85.10
ViT-B-16-plus-240	CLIP	240	82.62	81.61	85.05	96.11	37.84	33.43	72.37
ViT-B-16-plus-240	CLIP	512	86.72	84.22	89.41	97.95	43.27	37.68	83.52
ViT-L-14	CLIP	336	88.38	85.23	89.77	98.32	44.67	40.42	87.80
ViT-L-14	CLIP	518	90.86	87.75	91.66	98.45	45.74	42.09	89.93
ViT-L-14-336	CLIP	336	88.61	85.31	90.00	98.53	45.10	40.92	89.35
ViT-L-14-336	CLIP	518	92.57	89.06	93.31	98.71	48.90	45.38	92.43
dino_vitbase16	DINO	256	78.21	80.12	81.11	95.74	36.81	32.84	70.21
dino_vitbase16	DINO	512	84.11	83.52	85.91	97.74	42.86	38.27	83.00
dinov2_vitb14	DINO_v2	336	87.65	86.24	88.51	97.80	41.68	37.06	85.01
dinov2_vitb14	DINO_v2	518	90.25	87.48	90.86	98.66	45.56	41.23	91.80
dinov2_vitl14	DINO_v2	336	90.18	88.47	90.56	98.38	43.84	38.74	88.38
dinov2_vitl14	DINO_v2	518	91.73	89.20	92.27	98.78	47.12	42.79	92.40

BTAD

			Classification			Segmentation
Backbones	Pre-training	image size	AUROC-cls	F1-max-cls	AP-cls	AUROC-segm	F1-max-segm	AP-segm	PRO-segm
ViT-B-32	CLIP	256	92.19	95.55	98.47	96.74	43.98	35.70	68.56
ViT-B-32	CLIP	512	93.31	94.61	98.40	97.41	52.94	48.80	69.59
ViT-B-16	CLIP	256	92.44	91.00	97.31	97.45	55.27	52.19	72.68
ViT-B-16	CLIP	512	94.11	92.99	97.98	97.91	59.18	59.05	77.86
ViT-B-16-plus-240	CLIP	240	92.86	93.99	97.96	97.68	54.81	51.33	73.47
ViT-B-16-plus-240	CLIP	512	94.13	93.84	98.34	98.14	58.66	57.53	77.23
ViT-L-14	CLIP	336	92.74	93.21	97.71	97.84	56.60	55.94	77.01
ViT-L-14	CLIP	518	94.82	95.29	98.58	97.77	55.55	55.46	80.62
ViT-L-14-336	CLIP	336	95.11	94.48	98.53	97.42	56.75	55.23	79.63
ViT-L-14-336	CLIP	518	96.16	93.90	97.85	97.35	57.86	57.15	83.43
dino_vitbase16	DINO	256	93.63	95.66	98.66	97.55	52.16	49.25	72.86
dino_vitbase16	DINO	512	92.38	92.66	97.81	97.44	53.32	53.02	74.91
dinov2_vitb14	DINO_v2	336	93.60	91.65	97.19	98.08	63.28	65.32	74.35
dinov2_vitb14	DINO_v2	518	94.99	95.11	98.55	98.30	65.75	68.89	80.41
dinov2_vitl14	DINO_v2	336	94.15	92.64	97.61	98.19	63.86	66.03	76.33
dinov2_vitl14	DINO_v2	518	95.62	95.40	98.76	98.40	65.88	69.90	82.47

⌛Inference Time: <a href='#all_catelogue'>[Back to Catalogue]</a>

We show the inference time per image in the table below when using different backbones and image sizes. The default setting for number of images in mutual scoring module is 200, and GPU is NVIDIA RTX 3090.


Backbones	Pre-training	image size	times(ms/image)
ViT-B-32	CLIP	256	48.33
ViT-B-32	CLIP	512	95.74
ViT-B-16	CLIP	256	86.68
ViT-B-16	CLIP	512	450.5
ViT-B-16-plus-240	CLIP	240	85.25
ViT-B-16-plus-240	CLIP	512	506.4
ViT-L-14	CLIP	336	266.0
ViT-L-14	CLIP	518	933.3
ViT-L-14-336	CLIP	336	270.2
ViT-L-14-336	CLIP	518	955.3
dino_vitbase16	DINO	256	85.97
dino_vitbase16	DINO	512	458.5
dinov2_vitb14	DINO_v2	336	209.1
dinov2_vitb14	DINO_v2	518	755.0
dinov2_vitl14	DINO_v2	336	281.4
dinov2_vitl14	DINO_v2	518	1015.1

🙋🙋‍♂️Frequently Asked Questions: <a href='#all_catelogue'>[Back to Catalogue]</a>

Q: Why do large areas of high anomaly scores appear on normal images in the visualization?

A: In the visualization, in order to highlight abnormal areas, we adopt a single anomaly map normalization by default. Even if the overall response of the single map is low, a large number of highlighted areas will appear after normalization. Normalization of all the anomaly maps together can be achieved by adding the vis_type parameter to the shell script and setting it as whole_norm, or by modifying the testing->vis_type parameter in the ./configs/musc.yaml.

Q: How to set the appropriate input image resolution ?

A: The image resolution img_resize input into the backbone is generally set to a multiple of the patch size of ViT. The commonly used values are 224, 240, 256, 336, 512 and 518. In the previous section <a href='#results_backbones'>(jump)</a>, we show the two input image resolutions commonly used by different feature extractors for reference. The image resolution can be changed by modifying the 'img_resize' parameter in the shell script, or by modifying the datasets->img_resize parameter in the ./configs/musc.yaml configuration file.

Citation: <a href='#all_catelogue'>[Back to Catalogue]</a>

@inproceedings{Li2024MuSc,
  title={MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images},
  author={Li, Xurui and Huang, Ziming and Xue, Feng and Zhou, Yu},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

Thanks: <a href='#all_catelogue'>[Back to Catalogue]</a>

Our repo is built on PatchCore and APRIL-GAN, thanks their clear and elegant code !

License: <a href='#all_catelogue'>[Back to Catalogue]</a>

MuSc is released under the MIT Licence, and is fully open for academic research and also allow free commercial usage. To apply for a commercial license, please contact yuzhou@hust.edu.cn.