Home

Awesome

LocOV: Localized Vision-Language Matching for Open-vocabulary Object Detection

News

2022-07 (v0.1): This repository is the official PyTorch implementation of our GCPR 2022 paper: <a href="https://arxiv.org/pdf/2205.06160.pdf">Localized Vision-Language Matching for Open-vocabulary Object Detection</a>

<!-- published at ([slides](), [poster](), [poster session]() -->

Table of Contents

Installation

Requirements

Originally the code was tested on python=3.8.13, torch=1.10.0, cuda=11.2 and OS Ubuntu 20.04.

git clone https://github.com/lmb-freiburg/locov.git
cd locov

Prepare datasets

Download datasets

python tools/convert_annotations_to_ov_sets.py

Precompute the text features

python tools/coco_bert_embeddings.py

Precomputed generic object proposals

Train and validate Open Vocabulary Detection

Model Outline

<p align="center"><img src="assets/locov.png" alt="Method" title="LocOV" /></p>

Useful script commands

Train LSM stage

Run the script to train the Localized Semantic Matching stage

python train_ovnet.py --num-gpus 8 --resume --config-file configs/coco_lsm.yaml 

Train STT stage

Run the script to train the Localized Semantic Matching stage

python train_ovnet.py --num-gpus 8 --resume --config-file configs/coco_stt.yaml MODEL.WEIGHTS path_to_final_weights_lsm_stage

Evaluate

python train_ovnet.py --num-gpus 8 --resume --eval-only --config-file configs/coco_stt.yaml \
MODEL.WEIGHTS output/model-weights.pth \
OUTPUT_DIR output/eval_locov

Benchmark results

Models zoo

Pretrained models can be found in the models directory

ModelAP-novelAP50-novelAP-knownAP50-knownAP-generalAP50-generalWeights
LocOv17.21930.10933.49953.38328.12945.719LocOv

Acknowledgements

This work was supported by Deutscher Akademischer Austauschdienst - German Academic Exchange Service (DAAD) Research Grants - Doctoral Programmes in Germany, 2019/20; grant number: 57440921.

The Deep Learning Cluster used in this work is partially funded by the German Research Foundation (DFG) - 417962828.

We especially thank the creators of the following github repositories for providing helpful code:

License

<a rel="license" href="http://creativecommons.org/licenses/by/3.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0 Unported License</a> To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Citation

If you use our repository or find it useful in your research, please cite the following paper:

<pre class='bibtex'> @InProceedings{Bravo2022locov, author = "M. Bravo and S. Mittal and T. Brox", title = "Localized Vision-Language Matching for Open-vocabulary Object Detection", booktitle = "German Conference on Pattern Recognition (GCPR) 2022", year = "2022" } </pre>