Awesome
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Paper (ArXiv) | Project Page | Pre-trained Models
Shuquan Ye<sup>2</sup>,Yujia Xie<sup>1</sup>,Dongdong Chen<sup>1</sup>, Yichong Xu<sup>1</sup>, Lu Yuan<sup>1</sup>, Chenguang Zhu<sup>1</sup>, Jing Liao<sup>2</sup>
<sup>1</sup>Microsoft, <sup>2</sup>City University of Hong Kong <br>
This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.
Catalog:
-
Code for DANCE-augmented Pre-training
-
Code for DANCE-augmented Fine-tuning
-
Code for Image-Text Retrieval, OK-VQA
-
Download of Pre-trained and Fine-tuned Checkpoints
BibTeX