Awesome

TMLR2024 Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

Abstract

We present a comprehensive experimental study on pre-trained feature extractors for visual out-of-distribution (OOD) detection, focusing on leveraging contrastive language-image pre-trained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new, simple, and scalable method called pseudo-label probing (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. Intriguingly, we show that without modifying the weights of CLIP or training additional image/text encoders (i) PLP outperforms the previous state-of-the-art on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4% using the largest CLIP model (ViT-G), (ii) linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting feature-based adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.

Project setup

conda create -n plp python=3.7
conda activate plp
pip install -r requirements.txt

Next, you need to set the global paths that the embeddings will be saved _PRECOMPUTED_PATH, the folder where the default Pytorch ImageFolder deads the OOD datasets such as NINCO in _DEFAULT_PATH and the ImageNet path in _IMAGENET_PATH

On loaders/datasets.py:

_DEFAULT_PATH = 'path_to_ood_datasets'
_PRECOMPUTED_PATH = './data' 
_IMAGENET_PATH = 'Path_to_imagenet/.../ILSVRC/Data/CLS-LOC'

On model_builders/model_builders.py:

_PRECOMPUTED_PATH = './data'

Supported model names

from model_builders import available_models
print(available_models())

Such as

mae_vit_base convnext_base msn_vit_base  mae_vit_large mae_vit_huge ibot_vit_large ibot_vit_large_in21k beit_vit_large_in21k    
timm_vit_base_patch16_224 timm_vit_large_patch16_224 timm_vit_large_patch16_224_in21k timm_convnext_base_in22k timm_vit_large_patch16_224_in21k

Timm models are dependent on the timm version and need to be tested!

Supported CLIP models

openclip_RN50/openai openclip_ViT-B-16/openai openclip_ViT-B-16/laion2b openclip_ViT-L-14/openai openclip_ViT-L-14/laion2b
openclip_ViT-H-14/laion2b openclip_ViT-bigG-14/laion2b openclip_convnext_base/laion400m_s13b_b51k openclip_convnext_base_w
laion2b openclip_convnext_large_d/laion2b

Dataset names

We mainly use the following dataset names

CIFAR10 CIFAR100 IN1K inat SUN Places IN_O texture NINCO

Apart from the CIFAR datasets, you need to download the datasets and place them in _DEFAULT_PATH

Generated pre-computed image embeddings/representations

Here is an example of how to generate embeddings for one model and one dataset

python gen_embeds.py --arch openclip_ViT-L-14/openai --dataset CIFAR10 --batch_size 512  --no_eval_knn --overwrite

PLP: pseudo label probing

You can modify run_plp.sh and run it with conda activate plp && bash run_plp.sh. Here is an example for ImageNet(IN1K):

Important: You need the precomputed embeddings to run this (gen_embeds.py).

dataset=IN1K
batch_size=8192
arch="openclip_ViT-L-14/openai"
output_dir="experiments/PLP/dataset=$dataset/arch=$arch_name"
python linear_probing.py --arch="$arch" --dataset=$dataset --num_epochs=100 \ 
        --batch_size=$batch_size --output_dir=$output_dir   --seed=$seed  --pseudo_labels=True \
        --pseudo_prompt "a photo of a {c}." "a blurry photo of a {c}." "a photo of many {c}." "a photo of the large {c}." "a photo of the small {c}."
python logit_evaluation.py --probing_path=$output_dir --dataset=$dataset

Supervised linear probing using in-distribution labels

With the same scripts, you can run linear probing using all the supported models. Here is an example on ImageNet.

dataset=IN1K
batch_size=8192
arch="openclip_ViT-L-14/openai"
output_dir="experiments/PLP/dataset=$dataset/arch=$arch_name"
python linear_probing.py --arch="$arch" --dataset=$dataset --num_epochs=100 \ 
        --batch_size=$batch_size --output_dir=$output_dir   --seed=$seed  
python logit_evaluation.py --probing_path=$output_dir --dataset=$dataset

Instructions for all the considered comparisons and baselines

Step 1. Find `timm` names for baselines

To run the baselines in the polished open-source version, we use timm==0.9.2. Use the model names from timm using:

import timm
print(timm.list_models("*ft1k*", pretrained=True)) # finetuned imagenet1k models
print(timm.list_models("*in22k", pretrained=True)) # pretrained in21k models 
print(timm.list_models("*in21k*", pretrained=True)) # pretrained in21k models (different naming convention)

Baseline 1: Fine-tuned 1K models from timm

export CUDA_VISIBLE_DEVICES=1 && conda activate plp && python baseline_probe_in21k_models.py --dataset IN1K --archs convnext_base_in22k

Baseline 2: supervised linear probing of IN21K models from timm

export CUDA_VISIBLE_DEVICES=2 && conda activate plp && python baseline_probe_in21k_models.py --dataset IN1K --archs convnext_base_in22k

Baseline 3: Fine-tune any model on the supported datasets like CIFAR10 and CIFAR100

Modify fine_tune.sh and pass one of the suggested models above with our naming convention or run

torchrun --nproc_per_node=4 finetune.py  --dataset=CIFAR10 \
        --arch=timm_vit_large_patch16_224   --batch_size=64  \
        --epochs 100  --seed=$seed --warmup_epochs 5

Then use notebooks/ood_ft_model.ipynb to get the OOD detection performance metrics.

Baseline 4: CLIP zero-shot Pseudo-MSP (from Ming et al.)

Supported CLIP model names

openclip_RN50/openai openclip_ViT-B-16/openai openclip_ViT-B-16/laion2b openclip_ViT-L-14/openai openclip_ViT-L-14/laion2b
openclip_ViT-H-14/laion2b openclip_ViT-bigG-14/laion2b openclip_convnext_base/laion400m_s13b_b51k openclip_convnext_base_w
laion2b openclip_convnext_large_d/laion2b

Important: You need the precomputed embeddings to run this!

Mofidy and launch the script baseline_pseudo_msp_clip.sh:

conda activate plp && bash baseline_pseudo_msp_clip.sh

Or run it directly via:

python logit_evaluation.py --clip_arch=openclip_ViT-L-14/openai --dataset "CIFAR10" --out_dists "CIFAR100" --out_dir=$"experiments/pseudo-msp-clip/arch=openclip-ViT-L-14_openai/dataset=CIFAR10"   --eval_maha=True

Download OOD datasets

As explained in MOS, iNaturalist, SUN, and Places can be download via the following links:

wget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/iNaturalist.tar.gz
wget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/SUN.tar.gz
wget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/Places.tar.gz

NINCO dataset download

We use NINCO/NINCO_OOD_classes subfolder as an OOD detection benchmark. For more details check the NINCO github

Copied from NINCO github to facilitate reproduction:

ImageNet-O

From Natural Adversarial Examples from Dan Hendrycks et al. and their corresponding github:

Download the natural adversarial example dataset ImageNet-O for out-of-distribution detectors here.

Citation

@article{adaloglou2023adapting,
  title={Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection},
  author={Adaloglou, Nikolas and Michels, Felix and Kaiser, Tim and Kollmann, Markus},
  journal={arXiv e-prints},
  pages={arXiv--2303},
  year={2023}
}

Acknowledgments and Licence

The current codebase is a wild mixture of other GitHub repositories and packages listed below:

The codebase follows the licences of the above codebases.