Home

Awesome

GRIT-VLP: GRouped mIni-baTch sampling for Efficient Vision-Language Pre-training

This is the official PyTorch implementation of "GRIT-VLP: GRouped mIni-baTch sampling for Efficient Vision-Language Pre-training" (Accepted to ECCV 2022)

You can find the implementation codes for pre-training and fine-tuning GRIT-VLP.

<img src="img.png" width="600">

Pre-training Dataset Download:

Downstream-task Datasets:

Json Files:

Requirements:

Pre-training:

  1. Pre-train the model using 4 A100 GPUs:
<pre>python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Pretrain.py --config ./configs/Pretrain.yaml --output_dir output/Pretrain/ </pre>

Downstream tasks:

  1. IRTR (MS-COCO) using 4 A100 GPUs:
<pre>python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Retrieval.py --config ./configs/Retrieval_coco.yaml --output_dir output/Retrieval_coco/ --checkpoint [Pretrained checkpoint] </pre>
  1. IRTR (Flickr) using 4 A100 GPUs:
<pre>python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Retrieval.py --config ./configs/Retrieval_flickr.yaml --output_dir output/Retrieval_coco/ --checkpoint [Pretrained checkpoint] </pre>
  1. NLVR using 4 A100 GPUs:
<pre>python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Pretrain_nlvr.py --config ./configs/NLVR_pretrain.yaml --output_dir output/NLVR_pretrain/ --checkpoint [Pretrained checkpoint] python3 -m torch.distributed.launch --nproc_per_node=4 --use_env NLVR.py --config ./configs/NLVR.yaml --output_dir output/NLVR/ --checkpoint [NLVR-Pretrained checkpoint] </pre>
  1. VQA using 4 A100 GPUs:
<pre>python3 -m torch.distributed.launch --nproc_per_node=4 --use_env VQA.py --config ./configs/VQA.yaml --output_dir output/vqa/ --checkpoint [Pretrained checkpoint] </pre>

If you have any questions or problems to run this code, please mail to wotjr3868@snu.ac.kr or gxq9106@gmail.com. Thank you!

Acknowledgement:

Our code implementation is largely borrowed from ALBEF since our method is mainly built upon it. We appreciate the original authors for sharing code.