Awesome

LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning (NeurIPS2023)

📣 We have published our new survey on OOD detection and related tasks in Vision Language Model era! Check out our new paper!

Arch_figure This repository contains PyTorch implementation for our paper: LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

Abstract

We introduce a novel OOD detection approach called Local regularized Context Optimization (LoCoOp), which performs OOD regularization that utilizes the portions of CLIP local features as OOD features during training. CLIP's local features have a lot of ID-irrelevant nuisances (e.g., backgrounds), and by learning to push them away from the ID class text embeddings, we can remove the nuisances in the ID class text embeddings and enhance the separation between ID and OOD. Experiments on the large-scale ImageNet OOD detection benchmarks demonstrate the superiority of our LoCoOp over zero-shot, fully supervised detection methods and prompt learning methods. Notably, even in one shot setting -- just one label per class, LoCoOp outperforms existing zero-shot and fully supervised detection methods.

Requests to Followers 🤝

We kindly ask followers to observe the following two points:

1. Clarify whether MCM or GL-MCM was used at the time of inference. This is very important to see the performance of LoCoOp alone.
1. When testing other than ImageNet OOD Benchmark, change the value of the training "-topk" and "-lambda" argument and report the value in the paper. The current config is for ImageNet-1K.

Let's build a better Few-Shot OOD Detection community together!

News

2024/04/14: We added related work for CLIP-based parameter-efficient OOD detection so that we can easily follow this research area!
2023/09/22: We publish the code for training and evaluation.
2023/06/02: We make this repository public.

Requirement

Package

Our experiments are conducted with Python 3.8 and Pytorch 1.8.1.

All required packages are based on CoOp (for training) and MCM (for evaluation). This code is built on top of the awesome toolbox Dassl.pytorch so you need to install the dassl environment first. Simply follow the instructions described here to install dassl as well as PyTorch. After that, run pip install -r requirements.txt under LoCoOp/ to install a few more packages required by CLIP and MCM (this should be done when dassl is activated).

Datasets

Please create data folder and download the following ID and OOD datasets to data.

In-distribution Datasets

We use ImageNet-1K as the ID dataset.

Create a folder named imagenet/ under data folder.
Create images/ under imagenet/.
Download the dataset from the official website and extract the training and validation sets to $DATA/imagenet/images.

Besides, we need to put imagenet-classes.txt underimagenet/data folder. This .txt file can be downloaded via https://drive.google.com/file/d/1-61f_ol79pViBFDG_IDlUQSwoLcn2XXF/view

Out-of-distribution Datasets

We use the large-scale OOD datasets iNaturalist, SUN, Places, and Texture curated by Huang et al. 2021. We follow instructions from this repository to download the subsampled datasets.

The overall file structure is as follows:

LoCoOp
|-- data
    |-- imagenet
        |-- imagenet-classes.txt
        |-- images/
            |--train/ # contains 1,000 folders like n01440764, n01443537, etc.
            |-- val/ # contains 1,000 folders like n01440764, n01443537, etc.
    |-- iNaturalist
    |-- SUN
    |-- Places
    |-- Texture
    ...

Pre-trained Models

We share the 16-shot pre-trained models for LoCoOp. Please download them via the url.

Quick Start

1. Training

The training script is in LoCoOp/scripts/locoop/train.sh.

e.g., 1-shot training with ViT-B/16

CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/train.sh data imagenet vit_b16_ep50 end 16 1 False 0.25 200

e.g., 16-shot training with ViT-B/16

CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/train.sh data imagenet vit_b16_ep50 end 16 16 False 0.25 200

2. Inference

The inference script is in LoCoOp/scripts/locoop/eval.sh.

If you evaluate the model of seed1 created by the above 16-shot training code, please conduct the below command.

CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/eval.sh data imagenet vit_b16_ep50 1 output/imagenet/LoCoOp/vit_b16_ep50_16shots/nctx16_cscFalse_ctpend/seed1

The average scores of three seeds (1,2,3) are reported in the paper.

3. Visualization

The code for the visualization of extracted OOD regions is in LoCoOp/scripts/locoop/demo_visualization.sh.

e.g., image_path=data/imagenet/images/train/n04325704/n04325704_219.JPEG, label=824

sh scripts/locoop/demo_visualization.sh /home/miyai/LoCoOp/data imagenet vit_b16_ep50 output/imagenet/LoCoOp/vit_b16_ep50_16shots/nctx16_cscFalse_ctpend/seed1 data/imagenet/images/train/n04325704/n04325704_219.JPEG 824

The visualization result is in visualization/.

The visualization examples are below: Visualization_figure

Acknowledgement

We adopt these codes to create this repository.

Conditional Prompt Learning for Vision-Language Models, in CVPR, 2022.
Learning to Prompt for Vision-Language Models, IJCV, 2022.
Delving into Out-of-Distribution Detection with Vision-Language Representations, in NeurIPS, 2022
Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models, arXiv, 2023

Subsequent work for Parameter-Efficient OOD detection methods

Parameter-efficient OOD detection is a promising research direction, and LoCoOp can be a baseline approach for this field.
To catch up with this field, we summarized the subsequent work for CLIP-based efficient OOD detection methods. (Last update: 2024.04.14)

, code

This paper proposes PEFT-MCM, demonstrating the effectiveness of combining parameter-efficient tuning methods and MCM. To implement this, you can utilize our LoCoOp's code following the minor change.
LSN learns negative prompts for OOD detection, which is an orthogonal approach to LoCoOp and can be combined with LoCoOp.
, code

IDPrompt leverages ID-like outliers in the ID image to further leverage the capabilities of CLIP for OOD detection, which is a similar concept to LoCoOp.
, code

LSA first tackled full-spectrum OOD detection in the context of CLIP-based parameter-efficient OOD detection.
, code

NegPrompt learns a set of negative prompts with only ID data. Also, this paper tackled a novel promising problem setting called Open-vocabulary OOD detection.

If I missed some work, feel free to contact me by opening an issue!

Citaiton

If you find our work interesting or use our code/models, please consider citing:

@inproceedings{miyai2023locoop,
  title={LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  booktitle={Thirty-Seventh Conference on Neural Information Processing Systems},
  year={2023}
}

Besides, when you use GL-MCM (test-time detection method), please consider citing:

@article{miyai2023zero,
  title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2304.04521},
  year={2023}
}