Awesome
Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation
This repository is the official implementation of Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation.
Visualization of Some Prediction Results
Requirements
We implement this repo with the following environment:
- Ubuntu 22.04
- CUDA 12.1
- Python 3.11
- Pytorch 2.2.0
To install requirements:
# Please install Pytorch first before other packages
# Install KNN_CUDA
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
# Install Pointnet2_PyTorch(pointnet2_ops)
git clone https://github.com/erikwijmans/Pointnet2_PyTorch.git
cd Pointnet2_PyTorch
pip install -r requirements.txt
# You may encounter compilation issues for Pointnet2_PyTorch (see attached note).
# Now you can go back and install other packages for CMDIAD :)
pip install -r requirements.txt
π Sometimes conda's version control will cause the installation failure. We recommend using venv or conda to create a virtual environment and then use pip to install all packages. If you encountered compilation issues for Pointnet2_PyTorch, please modify
pointnet2_ops_lib/setup.py
with my attempts Pull request
Dataset and Pre-trained Models
Dataset
The MVTec 3D-AD
dataset can be downloaded from MVTec3D-AD.
It should be unzipped and placed under the datasets
folder.
Data Pre-processing
python utils/preprocessing.py --dataset_path datasets/mvtec_3d/
π It is recommended to use the default value for the path to the dataset to prevent problems in subsequent training and evaluation, but you can change the number of threads used according to your configuration. Please note that the pre-processing is performed in place.
Checkpoints
Purpose | Checkpoint |
---|---|
Point Clouds (PCs) feature extractor | Point-MAE |
RGB Images feature extractor | DINO |
Feature-to-Feature network (main PCs) | MTFI_FtoF_PCs |
Feature-to-Input network (main PCs) | MTFI_FtoI_PCs |
Input-to-Feature network (main PCs) | MTFI_ItoF_PCs |
Feature-to-Feature network (main RGB) | MTFI_FtoF_RGB |
Feature-to-Input network (main RGB) | MTFI_FtoI_RGB |
Input-to-Feature network (main RGB) | MTFI_ItoF_RGB |
π Please put all checkpoints in folder
checkpoints
.
Training
To train the models in the paper, run these commands:
MTFI pipeline with Feature-to-Feature distillation network:
To save the features for distillation network training:
python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_feature_for_fusion \
--save_path datasets/patch_lib \
The results are saved in the
results
folder. If you need to output the raw anomaly scores at image or pixel level to a file, add--save_raw_results
or--save_seg_results
. You can useutils/heatmap
to generate similar visualized results. You can define the maximum number of threads with--cpu_core_num
and leave your note through--experiment_note
.
To train MTFI pipeline with Feature-to-Feature distillation network:
python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/patch_lib \
--output_dir <your_output_dir_path> \
--train_method HallucinationCrossModality \
--num_workers 2 \
π For MTFI pipeline with Feature-to-Feature distillation network, PCs or RGB images as the main modality are trained simultaneously. If you think your GPU memory is really not enough, maybe try with
--accum_iter 2
for Gradient Accumulation and change--batch_size 16
correspondingly. The data is loaded into GPU memory in advance to speed up the training, you can change it through dataset and dataloader.
MTFI pipeline with Feature-to-Input distillation network:
To save the features for distillation network training:
python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_frgb_xyz \
--save_path_frgb_xyz datasets/frgb_xyz \
--save_rgb_fxyz \
--save_path_rgb_fxyz datasets/rgb_fxyz \
For PCs as main modality.
python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/rgb_fxyz \
--output_dir <your_output_dir_path> \
--train_method XYZFeatureToRGBInputConv \
For RGB images as main modality.
python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/frgb_xyz \
--output_dir <your_output_dir_path> \
--train_method RGBFeatureToXYZInputConv \
MTFI pipeline with Input-to-Feature distillation network:
Similarly, you need to store the features for distillation network training:
python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_frgb_xyz \
--save_path_frgb_xyz datasets/frgb_xyz \
--save_rgb_fxyz \
--save_path_rgb_fxyz datasets/rgb_fxyz \
For PCs as main modality.
python -u hallucination_network_pretrain.py \
--lr 0.0003 \
--batch_size 32 \
--data_path datasets/frgb_xyz \
--output_dir <your_output_dir_path> \
--train_method XYZInputToRGBFeatureHRNET \
--c_hrnet 128 \
--pin_mem \
For RGB images as main modality.
python -u hallucination_network_pretrain.py \
--lr 0.0002 \
--batch_size 32 \
--data_path datasets/rgb_fxyz \
--output_dir <your_output_dir_path> \
--train_method XYZInputToRGBFeatureHRNET \
--c_hrnet 192 \
--pin_mem \
Evaluation
Evaluate the model on MVTec 3D-AD with single and dual memory bank method
For single PCs memory bank:
python main.py \
--method_name Point_MAE \
--experiment_note <your_note> \
π For single RGB memory bank and dual memory bank, please replace
Point_MAE
withDINO
andDINO+Point_MAE
, respectively.
MTFI pipeline with Feature-to-Feature distillation network:
For PCs as main modality.
python main.py \
--method_name WithHallucination \
--use_hn \
--main_modality xyz \
--fusion_module_path checkpoints/MTFI_FtoF_PCs.pth \
--experiment_note <your_note> \
π For RGB images as main modality, please replace
xyz
withrgb
for--main_modality
and give the new checkpoint pathcheckpoints/MTFI_FtoF_RGB.pth
to the model.
MTFI pipeline with Feature-to-Input distillation network:
For PCs as main modality.
python main.py \
--method_name WithHallucinationFromFeature \
--use_hn_from_rgb_conv \
--main_modality xyz \
--fusion_module_path checkpoints/MTFI_FtoI_PCs.pth \
--experiment_note <your_note> \
π For RGB images as main modality, replace
xyz
withrgb
and give model the new checkpoint path.
MTFI pipeline with Input-to-Feature distillation network:
For PCs as main modality.
python main.py \
--method_name WithHallucination \
--use_hrnet \
--main_modality xyz \
--c_hrnet 128 \
--fusion_module_path checkpoints/MTFI_ItoF_PCs.pth \
--experiment_note <your_note> \
For RGB images as main modality.
python main.py \
--method_name WithHallucination \
--use_hrnet \
--main_modality rgb \
--c_hrnet 192 \
--fusion_module_path checkpoints/MTFI_ItoF_RGB.pth \
--experiment_note <your_note> \
Citation
If you think this repository is helpful for your project, please use the following.
@misc{sui2024crossmodal,
title={Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD},
author={Wenbo Sui and Daniel Lichau and Josselin Lefèvre and Harold Phelippeau},
year={2024},
eprint={2405.13571},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgement
We appreciate the following github repos for their valuable code: