Home

Awesome

Looking 3D: Anomaly Detection with 2D-3D Alignment

<table> <tr> <strong><a href="https://openaccess.thecvf.com/content/CVPR2024/papers/Bhunia_Looking_3D_Anomaly_Detection_with_2D-3D_Alignment_CVPR_2024_paper.pdf">Looking 3D: Anomaly Detection with 2D-3D Alignment</a></strong><br> Ankan Bhunia, Changjian Li, Hakan Bilen<br> CVPR 2024 </tr> </table>

Website paper dataset

<img src=figures/title.jpg>

<hr />

News

26/07/2024 - Had some issues with the dataset links. Now the data is hosted in huggingface.

Dataset

<img src=figures/data_preview.gif>

<table> <tr> <td><b>filename and download link</b></td> <td><b>folder structure</b></td> <td><b>size (after extracting)</b></td> <td><b>comments</b></td> </tr> <tr> <td><a href="https://huggingface.co/datasets/ankankbhunia/brokenchairs180k/resolve/main/images.zip">images.zip</a></td> <td>BrokenChairs/images/</td> <td>21 GB</td> <td>[1] (see below)</td> </tr> <tr> <td><a href="https://huggingface.co/datasets/ankankbhunia/brokenchairs180k/resolve/main/annotations.zip">annotations.zip</a></td> <td>BrokenChairs/annotations/</td> <td>2 GB</td> <td>[2] (see below)</td> </tr> <tr> <td><a href="https://huggingface.co/datasets/ankankbhunia/brokenchairs180k/resolve/main/shapes.zip">shapes.zip</a></td> <td>BrokenChairs/shapes/</td> <td>14 GB</td> <td>[3] (see below)</td> </tr> <tr> <td><a href="https://huggingface.co/datasets/ankankbhunia/brokenchairs180k/resolve/main/split.json">split.json</a></td> <td>BrokenChairs/split.json</td> <td>134 KB</td> <td>[4] (see below)</td> </tr> </table>

Note:

[1]BrokenChairs/images/: The filenames for the images have a specifc structure. For example in the file with name render_183_1944_2.5_300_30_3_normal.png, 183 is the shape_id, 1944 is the texture_id, 2.5_300_30_3 contains info on camera paramters (in the format of <radius>_<azim>_<elev>_<light-index>).

[2]BrokenChairs/annotations/:<info_*>: It contains 2d_bbox, IoU, camera_parameters and texture_id. <mask_new_*>: binary mask of the object part with the anomaly. <mask_old_*>: binary mask of the object part without the anomaly (normal). <mask_new_*>: segmentation mask of the chair with the anomaly. <mask_old_*>: segmentation mask of the chair without the anomaly (normal).

Annotations are available for anomaly images only. For some anomaly types like missing component, <mask_old_*> is not available.

[3]BrokenChairs/shapes/: <mv_images/*.png>: grayscale multi-view image, <mv_images/*.json>: json file containing intristic and extrinsic parameters of the rendered image, <mv_images/*.npy>: npy file containing 2D-3D correspondence points. <model_id.txt>: corresponding ShapeNet id.

Please refer to utils/render_multiview.py which can be used to obtain the above <png/json/npy> files from any given obj/stl/glb mesh shape.

[4]BrokenChairs/split.json: train/test/val split. Each set has mutually exclusive shape instances.

<img src=figures/part_stats.png width=600px>

More on the proposed novel task & why it is relevent

Standard Anomaly Detection (AD) frameworks perform pooly without clear defination of ‘normality’, especially when abnormalities are arbitrary and instance-specific. Our paper introduces a novel conditional AD task, along with a new benchmark and an effective solution, that aims to identify and localize anomalies from a photo of an object instance (i.e., the query image), in relation to a reference 3D model. The 3D model provides the reference shape for the regular object instance, and hence a clear definition of regularity for the query image. This setting is motivated by real-world applications in inspection and quality control, where an object instance is manufactured based on a reference 3D model, which can then be used to identify anomalies (i.e., production faults, damages) from a photo of the instance.

<table> <tr> <td style="text-align: center;"> <img src="figures/left.jpeg" alt="Image 1 Description" width="300" /> </td> <td style="text-align: center;"> <img src="figures/right.jpeg" alt="Image 2 Description" width="300" /> </td> </tr> </table>

Pretrained Models

Conda Installation

# Create a conda virtual environment for basic training/testing: 
conda create -n Looking3D python=3.8
conda activate Looking3D
pip install opencv-python wandb tqdm albumentations einops h5py kornia bounding_box matplotlib omegaconf trimesh[all] xformers

# install pyrender and pytorch3d (optional; only required for rendering multiview images)
pip install pyrender
pip install fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

Training and Evaluation on BrokenChairs-180K

Please see EXPERIMENT.md for commands to run the training and testing codes.

Inference

You can use the python function predict(.) in the demo.py file for the inference.

from demo import predict

pred_labels = predict(query_path = "sample/query_example.png", \
                     mv_path = "sample/mv_images/", \
                     resume_ckpt = "experiments/CMT-final/checkpoints/model.pt", device = "cuda", topk = 100)

Testing using custom data

# Step 1. Render grayscale multiview images (pytorch3d and pyrender required)
python utils/render_multiview.py \
  --obj_path data/mesh_reference.glb \
  --out_path data/itw_testing/sample/mv_images/ \
  --num_imgs 20 \
  --gray_scale
# Step 2. Test the model
python demo.py --mv_path data/itw_testing/sample/mv_images/ \
  --query_path data/query_example.png \
  --resume_ckpt experiments/CMT-final/checkpoints/model.pt

Citation

If you use the results and code for your research, please cite our paper:

@article{bhunia2024look3d,
  title={Looking 3D: Anomaly Detection with 2D-3D Alignment},
  author={Bhunia, Ankan Kumar and Li, Changjian and Bilen, Hakan},
  journal={CVPR},
  year={2024}
}