Home

Awesome

<div align="center">

Soldier-Offier Window self-Attention (SOWA)

<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a> <a href="https://github.com/ashleve/lightning-hydra-template"><img alt="Template" src="https://img.shields.io/badge/-Lightning--Hydra--Template-017F2F?style=flat&logo=github&labelColor=gray"></a><br> Paper Conference

</div>

Description

<div align="center"> <img src="https://github.com/huzongxiang/sowa/blob/resources/fig1.png" alt="concept" style="width: 50%;"> </div>

Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive normal datasets and custom models, limiting scalability. Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We introduce a window self-attention mechanism based on the CLIP model, combined with learnable prompts to process multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.

architecture

Installation

Pip

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# [OPTIONAL] create conda environment
conda create -n sowa python=3.9
conda activate sowa

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# create conda environment and install dependencies
conda env create -f environment.yaml -n sowa

# activate conda environment
conda activate sowa

How to run

Data

Process the downloaded data using data scripts, specifying the data set location in the configuration file sowa_mvt.yaml

_target_: src.data.anomaly_clip_datamodule.AnomalyCLIPDataModule
data_dir:
  train: /home/hzx/Projects/Data/Visa
  valid: /home/hzx/Projects/Data/MVTec-AD
  test: /home/hzx/Projects/Data/MVTec-AD

Train

Train model with default configuration

# train on mvtec
python src/train.py trainer=gpu data=sowa_mvt model=sowa_hfwa

# train on visa
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa

Inference

Weights can be downloaded from Huggingface Project or Baidu Cloud

# eval on visa
python src/eval.py trainer=gpu data=sowa_visa model=sowa_hfwa ckpt_path=your_mvtec_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

# eval on mvt
python src/eval.py trainer=gpu data=sowa_mvt model=sowa_hfwa ckpt_path=your_visa_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

Results

Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

MetricDatasetWinCLIPApril-GANOurs
AC AUROCMVTec-AD95.2±1.392.8±0.296.8±0.3
Visa87.3±1.892.6±0.492.9±0.2
BTAD87.0±0.292.1±0.294.8±0.2
DAGM93.8±0.296.2±1.198.9±0.3
DTD-Synthetic98.1±0.298.5±0.199.1±0.0
AC APMVTec-AD97.3±0.696.3±0.198.3±0.3
Visa88.8±1.894.5±0.394.5±0.2
BTAD86.8±0.095.2±0.595.5±0.7
DAGM83.8±1.186.7±4.595.2±1.7
DTD-Synthetic99.1±0.199.4±0.099.6±0.0
AS AUROCMVTec-AD96.2±0.395.9±0.095.7±0.1
Visa97.2±0.296.2±0.097.1±0.0
BTAD95.8±0.094.4±0.197.1±0.0
DAGM93.8±0.188.9±0.496.9±0.0
DTD-Synthetic96.8±0.296.7±0.098.7±0.0
AS AUPROMVTec-AD89.0±0.891.8±0.192.4±0.2
Visa87.6±0.990.2±0.191.4±0.0
BTAD66.6±0.278.2±0.181.2±0.2
DAGM82.4±0.377.8±0.994.4±0.1
DTD-Synthetic90.1±0.592.2±0.096.6±0.1

<!-- 零宽空格 -->

Performance Comparison on MVTec-AD and Visa Datasets.

MethodSourceMVTec-AD AC AUROCMVTec-AD AS AUROCMVTec-AD AS PROVisa AC AUROCVisa AS AUROCVisa AS PRO
SPADEarXiv 202084.8±2.592.7±0.387.0±0.581.7±3.496.6±0.387.3±0.8
PaDiMICPR 202180.4±2.492.6±0.781.3±1.972.8±2.993.2±0.572.6±1.9
PatchCoreCVPR 202288.8±2.694.3±0.584.3±1.685.3±2.196.8±0.384.9±1.4
WinCLIPCVPR 202395.2±1.396.2±0.389.0±0.887.3±1.897.2±0.287.6±0.9
April-GANCVPR 2023 VAND workshop92.8±0.295.9±0.091.8±0.192.6±0.496.2±0.090.2±0.1
PromptADCVPR 202496.6±0.996.5±0.2-89.1±1.797.4±0.3-
InCTRLCVPR 202494.5±1.8--87.7±1.9--
SOWAOurs96.8±0.395.7±0.192.4±0.292.9±0.297.1±0.091.4±0.0

<!-- 零宽空格 -->

Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

<div align="center"> <img src="https://github.com/huzongxiang/sowa/blob/resources/fig5.png" alt="few-shot" style="width: 70%;"> </div>

Visualization

Visualization results under the few-shot setting (K=4).

<div align="center"> <img src="https://github.com/huzongxiang/sowa/blob/resources/fig6.png" alt="concept" style="width: 70%;"> </div>

Mechanism

Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion. mechanism

Inference Speed

Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.

<div align="center"> <img src="https://github.com/huzongxiang/sowa/blob/resources/fig9.png" alt="speed" style="width: 80%;"> </div>

Citation

Please cite the following paper if this work helps your project:

@article{hu2024sowa,
  title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
  author={Hu, Zongxiang and Zhang, zhaosheng},
  journal={arXiv preprint arXiv:2407.03634},
  year={2024}
}

Contact

If you have any problem with this code, please feel free to contact mail huzongxiang1991@gmail.com or wechat voodoozx2015.