Home

Awesome

VLDet: Learning Object-Language Alignments for Open-Vocabulary Object Detection

<p align="center"> <img src='docs/readme.jpeg' align="center" height="200px"> </p>

Learning Object-Language Alignments for Open-Vocabulary Object Detection,
Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai,
ICLR 2023 (https://arxiv.org/abs/2211.14843)

Highlight

We are excited to announce that our paper was accepted to ICLR 2023! 🥳🥳🥳

A quick explainable video demo for VLDet

https://user-images.githubusercontent.com/6366788/218620999-1eb5c5eb-0479-4dcc-88ca-863f34de25a0.mp4

Performance

Open-Vocabulary on COCO

<p align="center"> <img src="https://user-images.githubusercontent.com/6366788/214261751-3007d40c-5a5d-4efd-8acd-7f6a4ea62ce3.png" width=68%> <p>

Open-Vocabulary on LVIS

<p align="center"> <img src="https://user-images.githubusercontent.com/6366788/214262298-ab2de22b-910a-44ba-9bc5-f0df6e4d5e14.png" width=68%> <p>

Installation

Requirements

Example conda environment setup

conda create --name VLDet python=3.7 -y
conda activate VLDet
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia

# under your working directory

git clone https://github.com/clin1223/VLDet.git
cd VLDet
cd detectron2
pip install -e .
cd ..
pip install -r requirements.txt

Features

Benchmark evaluation and training

Please first prepare datasets.

The VLDet models are finetuned on the corresponding Box-Supervised models (indicated by MODEL.WEIGHTS in the config files). Please train or download the Box-Supervised model and place them under VLDet_ROOT/models/ before training the VLDet models.

To train a model, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml

To evaluate a model with a trained/ pretrained model, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth

Download the trained network weights here.

OV_COCObox mAP50box mAP50_novel
config_RN5045.832.0
OV_LVISmask mAP_allmask mAP_novel
config_RN5030.121.7
config_Swin-B38.126.3

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@article{VLDet,
  title={Learning Object-Language Alignments for Open-Vocabulary Object Detection},
  author={Lin, Chuang and Sun, Peize and Jiang, Yi and Luo, Ping and Qu, Lizhen and Haffari, Gholamreza and Yuan, Zehuan and Cai, Jianfei},
  journal={arXiv preprint arXiv:2211.14843},
  year={2022}
}

License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Acknowledgement

This repository was built on top of Detectron2, Detic, RegionCLIP and OVR-CNN. We thank for their hard work.