Home

Awesome

Grounding Visual Representations with Texts (GVRT)

Grounding Visual Representations with Texts for Domain Generalization
Seonwoo Min, Nokyung Park, Siwon Kim, Seunghyun Park, Jinkyu Kim
ECCV 2022 | Official Pytorch implementation

We advocate for leveraging the vision-and-language cross-modality supervision for the DG task.

image-gvrt

Installation

We recommend creating a conda environment and installing the necessary python packages as:

git clone https://github.com/mswzeus/GVRT.git
cd GVRT
ln -s ../src DomainBed_GVRT/src
conda create -n GVRT python=3.8
conda activate GVRT
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

CUB-DG Benchmark Dataset

We created CUB-DG to investigate the cross-modality supervision in the DG task (<a href="https://drive.google.com/file/d/1BU8Jy0a1mdNCbIpUUBrQPqQfNXGXfm1f/view?usp=sharing">Download Link</a>).
CUB is an image dataset with photos of 200 bird species. For more information, please see the <a href="http://www.vision.caltech.edu/visipedia/CUB-200.html">original repo</a>.
We used pre-trained style transfer models to obtain images from three other domains, i.e. Art, Paint, and Cartoon.

image-cub-dg

Pre-trained Models

We provide the following pre-trained models for three independent runs (<a href="https://drive.google.com/file/d/11CbVRWlSHWd2HPkBkp2ZanUVBFFau8Dx/view?usp=sharing">Download Link</a>).

How to Run

Training a GVRT model

You can use the <code>train_model.py</code> script with the necessary configurations as:

CUDA_VISIBLE_DEVICES=0 python train_model.py --algorithm GVRT --test-env 0 --seed 0 --output-path results/PTE_test0_seed0 

Evaluating a GVRT model

You can use the <code>evaluate_model.py</code> script with the necessary configurations as:

CUDA_VISIBLE_DEVICES=0 python evaluate_model.py --algorithm GVRT --test-env 0 --seed 0 --output-path results/PTE_test0_seed0 --checkpoint pretrained_models/PTE_test0_seed0.pt

Experimental Results on CUB-DG

We report averaged results across three independent runs.

<img src="./docs/main_results.png" width="60%">

Citation

If you find our work useful, please kindly cite this paper:

@article{min2022grounding,
  author    = {Seonwoo Min and Nokyung Park and Siwon Kim and Seunghyun Park and Jinkyu Kim},
  title     = {Grounding Visual Representations with Texts for Domain Generalization},
  journal   = {arXiv},
  volume    = {abs/2207.10285},
  year      = {2022},
  url       = {https://arxiv.org/abs/2207.10285}
}