Awesome

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)

This repo contains the source code of our ECCV 2022 paper MS-CLIP:

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training <br> 2022 European Conference on Computer Vision (ECCV 2022) <br> By Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan.

Introduction

MS-CLIP

We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that lightweight modality-specific parallel modules further improve performance.

MS-CLIP-S

Update

[07/20/2022] Released pretrained model and zero-shot evaluation on ImageNet-1k.

Pre-trained Weights

Model	Training Set	Top-1 on IN-1K	LP* on 24 datasets	Download
MS-CLIP-S (ViT-B/32)	YFCC-22M	36.7	68.5	ckpt/config
MS-CLIP-S (ViT-B/16)	YFCC-22M	39.0	70.4	ckpt/config
MS-CLIP-S (ViT-B/32)	LAION-20M	40.2	73.3	ckpt/config

*LP: Linear Probing

Getting Started

Installation

Please follow INSTALL.md for installation

Data preparation

Please follow DATA.md for data preparation.

Pre-trained weights preparation

Download from the links in the table above. Put the weights under ./OUTPUT_MODEL/.

Evaluation

To evaluate a pre-trained MS-CLIP-S on ImageNet Zero-shot Classification, run:

CUDA_VISIBLE_DEVICES=0 python tools/eval_zeroshot.py --model <config-file>

where <config-file> is the config yaml under experiments/model/. E.g. experiments/model/b32-laion-msclips.yaml

Contact

If you have any questions, please contact Haoxuan You or Luowei Zhou.