Awesome
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)
This repo contains the source code of our ECCV 2022 paper MS-CLIP:
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training <br> 2022 European Conference on Computer Vision (ECCV 2022) <br> By Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan.
Introduction
We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that lightweight modality-specific parallel modules further improve performance.
Update
- [07/20/2022] Released pretrained model and zero-shot evaluation on ImageNet-1k.
Pre-trained Weights
Model | Training Set | Top-1 on IN-1K | LP* on 24 datasets | Download |
---|---|---|---|---|
MS-CLIP-S (ViT-B/32) | YFCC-22M | 36.7 | 68.5 | ckpt/config |
MS-CLIP-S (ViT-B/16) | YFCC-22M | 39.0 | 70.4 | ckpt/config |
MS-CLIP-S (ViT-B/32) | LAION-20M | 40.2 | 73.3 | ckpt/config |
*LP: Linear Probing
Getting Started
Installation
Please follow INSTALL.md for installation
Data preparation
Please follow DATA.md for data preparation.
Pre-trained weights preparation
Download from the links in the table above. Put the weights under ./OUTPUT_MODEL/
.
Evaluation
To evaluate a pre-trained MS-CLIP-S on ImageNet Zero-shot Classification, run:
CUDA_VISIBLE_DEVICES=0 python tools/eval_zeroshot.py --model <config-file>
where <config-file>
is the config yaml under experiments/model/
. E.g. experiments/model/b32-laion-msclips.yaml
Contact
If you have any questions, please contact Haoxuan You or Luowei Zhou.