Awesome
Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers
This repository is the official implementation of the BMVC2024 paper Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers by Jochem Loedeman, Maarten Stol, Tengda Han and Yuki M Asano.
<img src="figure/arch.png" alt="drawing" width="1000"/>Requirements
To install python dependencies, make sure that poetry is installed and execute the following in the project root directory:
poetry install
Data
See data/README.md
DINO
Download the full checkpoint for DINO ViT-S/16 from here and insert it as pgn/pgn_models/dino/dino_deitsmall16_pretrain_full_checkpoint.pth
.
Training
To train/test with the CLIP backbone, run
poetry run train_clip
poetry run test_clip
To train/test with either DINO or supervised ViT, specify the backbone with --vision_model_type
and run
poetry run train_visionmodel
poetry run test_visionmodel
For all available command line arguments, see pgn/scripts
.
Pretrained PGNs
Pretrained PGNs are supplied in pretrained_pgns/
. To use them in the context of this repository, specify the desired model by setting the --pgn_path
argument in the test scripts.
Reference
If you find this repository is useful for your project, please consider citing our paper:
@article{Loedeman2022prompt,
author = "Jochem Loedeman and Maarten Stol and Tengda Han and Yuki M Asano",
title = "Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers",
journal = "arxiv preprint arxiv:2210.06466",
year = "2022",
}