Home

Awesome

A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture

NeurIPS (Datasets and Benchmarks) 2023

Authors: Qianqian Shen<sup>*</sup>, Yunhan Zhao<sup>*</sup>, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong

If you find our model/method/dataset useful, please cite our work (NeurIPS version on arxiv):

@article{shen2024high,
  title={A High-Resolution Dataset for Instance Detection with Multi-View Object Capture},
  author={Shen, Qianqian and Zhao, Yunhan and Kwon, Nahyun and Kim, Jeeeun and Li, Yanan and Kong, Shu},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

The InsDet datase is a high-resolution real-world dataset for Instance Detection with Multi-view Instance Capture.<br> We provide an InsDet-mini for demo and visualization, and the full dataset InsDet-FULL.

Dataset

The full dataset contains 100 objects with multi-view profile images in 24 rotation positions (per 15°), 160 testing scene images with high-resolution, and 200 pure background images. The mini version contains 5 objects, 10 testing scene images, and 10 pure background images.

Details

The Objects contains:

vis-objects

Tip: The first three digits specify the instance id.

The Scenes contains:

vis-scenes

Tip: Each bounding box is specified by [xmin, ymin, xmax, ymax].

The Background contains 200 pure background images that do not include any instances from Objects folder.

vis-background

Code

The project is built on detectron2, segment-anything, and DINOv2.<br>

<!-- Detectron2 provides end-to-end detectors implementation and metric evaluation. Segment-anything is an off-the-shelf class-agnostic segmentation model that we used to produce instance proposals. DINOv2 is a self-supervised vision foundation model that we used to extract feature representation. --> <!-- ### Data preparation All profile images in InsDet-Objects are preprocessed by using `minify`, `resizemask`, `getbbox`, `centercrop`, and `invertmask` packed in `gendata/data_utils.py`. Examples for single or loop operation are included in `gendata`. -->

Demo

The Jupyter notebooks files demonstrate our non-learned method using SAM and DINOv2. We choose light pretrained models of SAM (vit_l) and DINOv2 (dinov2_vits14) for efficiency.

<!-- | Pretrained Model | # of params | AP | AP50 | AP75 | | :--- | :---: | :---:| :---:| :---:| | ViT-S/14 distilled | 21M |41.61 |49.10 |45.95 | |ViT-B/14 distilled | 86M |41.89 |49.39 |46.30 | |ViT-L/14 distilled | 300M |43.33 |50.80 |47.84 | |ViT-g/14 | 1,100M |44.65 |53.47 |49.11 | -->