Awesome

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Paper (ArXiv) | Project Page | Pre-trained Models

Shuquan Ye<sup>2</sup>,Yujia Xie<sup>1</sup>,Dongdong Chen<sup>1</sup>, Yichong Xu<sup>1</sup>, Lu Yuan<sup>1</sup>, Chenguang Zhu<sup>1</sup>, Jing Liao<sup>2</sup>

<sup>1</sup>Microsoft, <sup>2</sup>City University of Hong Kong <br>

This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.

Catalog:

Code for DANCE-augmented Pre-training
Code for DANCE-augmented Fine-tuning
Code for Image-Text Retrieval, OK-VQA
Download of Pre-trained and Fine-tuned Checkpoints

BibTeX