Home

Awesome

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

paper     paper     paper

Generalized OOD Detection v2

🚀 Our framework encapsulates the evolution of OOD detection and related tasks in the VLM era, fostering collaborative efforts among each community 🤝
<p align="center" width="100%"> <img src=figs/evo_vlm.png width="100%" height="100%"> <div> <div align="center"> <br> <a href='https://atsumiyai.github.io/' target='_blank'>Atsuyuki Miyai<sup>1</sup></a>&emsp; <a href='https://jingkang50.github.io/' target='_blank'>Jingkang Yang<sup>2,†</sup></a>&emsp; <a href='https://zjysteven.github.io/' target='_blank'>Jingyang Zhang<sup>3</sup></a>&emsp; <a href='https://pages.cs.wisc.edu/~alvinming/' target='_blank'>Yifei Ming<sup>4</sup></a>&emsp; <a href='https://yueqianlin.com/' target='_blank'>Yueqian Lin<sup>3</sup></a>&emsp; <br> <a href='https://yu1ut.com/' target='_blank'>Qing Yu<sup>1,5</sup></a>&emsp; <a href='https://scholar.google.co.jp/citations?hl=ja&user=2bCSG1AAAAAJ&view_op=list_works&authuser=1&sortby=pubdate' target='_blank'>Go Irie<sup>6</sup></a>&emsp; <a href='https://raihanjoty.github.io/' target='_blank'>Shafiq Joty<sup>4,2</sup></a>&emsp; <a href='https://pages.cs.wisc.edu/~sharonli/' target='_blank'>Yixuan Li<sup>7</sup></a>&emsp; <a href='https://ece.duke.edu/faculty/hai-helen-li' target='_blank'>Hai Li<sup>3</sup></a>&emsp; <a href='https://liuziwei7.github.io/' target='_blank'>Ziwei Liu<sup>2,†</sup></a> <br> <a href='https://scholar.google.com/citations?user=rE9iY5MAAAAJ&hl=en' target='_blank'>Toshihiko Yamasaki<sup>1</sup></a>&emsp; <a href='https://scholar.google.co.jp/citations?user=CJRhhi0AAAAJ&hl=en' target='_blank'>Kiyoharu Aizawa<sup>1</sup></a> </div> <div align="center"> <sup>1</sup>The University of Tokyo&emsp; <sup>†</sup>S-Lab, <sup>2</sup>Nanyang Technological University&emsp; <sup>3</sup>Duke University&emsp; <sup>4</sup>Salesforce AI Research&emsp; <sup>5</sup>LY Corporation&emsp; <sup>6</sup>Tokyo University of Science&emsp; <sup>7</sup>University of Wisconsin-Madison&emsp;&emsp; <br> </div>

About This Repository

This is a repository of our survey paper. We hope that our survey can help readers and participants better understand the demanding challenges on OOD detection and related topics in the VLM era.
This repository plays the following two roles:

Abstract

We present a generalized OOD detection v2, encapsulating the evolution of Anomaly Detection (AD), Novelty Detection (ND), Open-set Recognition (OSR), Out-of-distribution (OOD) detection, and Outlier Detection (OD) in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges in the VLM era have become OOD detection and AD. In addition to the inter-field evolution, we also highlight the significant shift in the definition, problem settings, and benchmarks; our work thus features a comprehensive review of the methodology for OOD detection, including in-depth discussion over other related tasks to clarify their relationship to and influence on OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, represented by GPT-4V. We conclude this survey with open challenges and potential research directions of OOD detection in the VLM and LVLM era.

Common Benchmarks

<details open> <summary><b>CLIP-based OOD Detection</b></summary>
</details> <details open> <summary><b>CLIP-based AD</b></summary>
</details>

Methodology

We introduce methods for CLIP-based OOD detection and CLIP-based AD.
To provide diverse perspectives on OOD detection approaches, we have encompassed a wide range of methods, including preprints.

Timeline

timeline.png

Paper List

methods.png

CLIP-based OOD Detection

<details open> <summary><b> Zero-shot</b></summary>
</details> <details open> <summary><b> Few-shot</b></summary>
</details> <details open> <summary><b> Others</b></summary>
</details>

CLIP-based AD

<details open> <summary><b> Zero-shot</b></summary>
</details> <details open> <summary><b> Few-shot</b></summary>
</details> <details open> <summary><b> Others</b></summary>
</details>

Early Advance in LVLM Era

evolution_lvlm.png

In the LVLM Era, OOD detection and related topics have evolved as follows:

(i) Sensory Anomaly Detection ⇒ Sensory Anomaly Detection

<details open> <summary><b>AD</b></summary>
</details>

(ii) OOD Detection ⇒ Unsolvable Problem Detection

<details open> <summary><b>UPD</b></summary>
</details>

Acknowledgment

This repository is built upon the foundation of the following resources: generalized OOD detection v1, OpenOOD codebase.

Contact

If you have questions or find any mistake, please open an issue mentioning @AtsuMiyai.

Citation

If you find our survey paper helpful for your research, please consider citing the following paper:

@article{miyai2024generalized2,
  title={Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey},
  author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Lin, Yueqian and Yu, Qing and Irie, Go and Joty, Shafiq and Li, Yixuan and Li, Hai and Liu, Ziwei and Yamasaki, Toshihiko and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2407.21794},
  year={2024}
}

Besides, please also consider citing our other projects that are closely related to this survey.

# generalized OOD detection framework v1, survey
@article{yang2024generalized,
  title={Generalized out-of-distribution detection: A survey},
  author={Yang, Jingkang and Zhou, Kaiyang and Li, Yixuan and Liu, Ziwei},
  journal={IJCV},
  pages={1--28},
  year={2024},
}

# MCM (Zero-shot OOD detection)
@inproceedings{ming2022delving,
  title={Delving into out-of-distribution detection with vision-language representations},
  author={Ming, Yifei and Cai, Ziyang and Gu, Jiuxiang and Sun, Yiyou and Li, Wei and Li, Yixuan},
  booktitle={NeurIPS},
  year={2022}
}

# GL-MCM (Zero-shot OOD detection)
@article{miyai2023zero,
  title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2304.04521},
  year={2023}
}

# PEFT-MCM (Few-shot OOD detection, Concurrent work with LoCoOp)
@article{ming2024does,
  title={How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?},
  author={Ming, Yifei and Li, Yixuan},
  journal={IJCV},
  volume={132},
  number={2},
  pages={596--609},
  year={2024},
}

# LoCoOp (Few-shot OOD detection, Concurrent work with PEFT-MCM)
@inproceedings{miyai2023locoop,
  title={LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  booktitle={NeurIPS},
  year={2023}
}

# UPD
@article{miyai2024upd,
  title={Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models},
  author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Yu, Qing and Irie, Go and Li, Yixuan and Li, Hai and Liu, Ziwei and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2403.20331},
  year={2024}
}
# OpenOOD 
@inproceedings{yang2022openood,
  title={Openood: Benchmarking generalized out-of-distribution detection},
  author={Yang, Jingkang and Wang, Pengyun and Zou, Dejian and Zhou, Zitang and Ding, Kunyuan and Peng, Wenxuan and Wang, Haoqi and Chen, Guangyao and Li, Bo and Sun, Yiyou and others},
  booktitle={NeurIPS Datasets and Benchmarks Track},
  year={2022}
}

# OpenOOD v1.5 report
@article{zhang2023openood,
  title={OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection},
  author={Zhang, Jingyang and Yang, Jingkang and Wang, Pengyun and Wang, Haoqi and Lin, Yueqian and Zhang, Haoran and Sun, Yiyou and Du, Xuefeng and Zhou, Kaiyang and Zhang, Wayne and Li, Yixuan and Liu, Ziwei and Chen, Yiran and Li, Hai},
  journal={arXiv preprint arXiv:2306.09301},
  year={2023}
}