Awesome
CPT
This is the source code for paper "CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models".
Recent Updates
- 2022.05.06 Initialize CPT for grounding, VRD, GQA, and VCR codes.
- 2022.05.15 Test CPT code for grounding, GQA, and VCR tasks.
- 2022.05.19 Test CPT code for VRD.
Quick links
Overview
The code is based on two sub-repos. The prompt-feat is used to extract visual features with the help of pre-trained object detector. The Oscar is the pre-trained vision and language model to conduct inference.
Install
We wrap all the commands in install.sh
. You can directly run bash install.sh
. Or:
# you can direcly run by
# bash install.sh
# create a new environment
conda create --name cpt python=3.7
conda activate cpt
# install pytorch1.6
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
export INSTALL_DIR=$PWD
# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
cd ..
# install requirements
pip install -r requirements.txt
# install prompt_feat
cd prompt_feat
python setup.py build develop
cd ..
# install oscar
cd Oscar
# install transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
git reset --hard 067923d3267325f525f4e46f357360c191ba562e
cd ..
# install coco_caption
git clone https://github.com/LuoweiZhou/coco-caption.git
cd coco-caption
git reset --hard de6f385503ac9a4305a1dcdc39c02312f9fa13fc
# ./get_stanford_models.sh
cd ..
python setup.py build develop
unset INSTALL_DIR
Preparation
Before running the code, please first download the pre-trained feature extractor and Oscar models.
bash cmds/prepare_data/download_checkpoints.sh
After downloading, there should be:
Oscar/pretrained_models/image_captioning/pretrained_base/pytorch_model.bin
prompt_feat/models/vinvl/vinvl_vg_x152c4.pth
Tasks
1. Visual Grounding
Visual Grounding task is to find the visual region corresponding to a query sentence e.g.: the black horse.
Data
Note: all the data will be downloaded at the data
directory. If you want to download it at somewhere else, you can create a soft link:
ln -s your_data_path data
Please download the data first.
bash cmds/prepare_data/download_refcoco.sh
Feature Extraction
To extract features:
cd prompt_feat
bash cmds/refcoco/prepare.sh # make sure you have at least 4 GPUs
# actually 1 GPU is also OK, don't panic.
To modify the code to single GPU or other amount. Please go to prompt_feat/cmds/refcoco/cpt
, and modify CUDA_VISIBLE_DEVICES
, --nproc_per_node
and TEST.IMS_PER_BATCH
correspondingly.
CPT Inference
To inference:
cd Oscar
bash cmds/refcoco/cpt_run_all.sh
We use the GPU:0
as default choice. If you want to modify the GPU id, please go to cmds/refcoco/cpt_run_all.sh
and modify the GPU=0
to the GPU id you want.
Evaluation
To evaluate, please run:
cd Oscar
python eval/refcoco/fewshot_eval.py results/refcoco/fsl/
2. GQA
GQA is a QA dataset, required reasoning ability.
Data
Please download the data first.
bash cmds/prepare_data/download_gqa.sh
Feature Extraction
To extract features:
cd prompt_feat
bash cmds/gqa/prepare.sh # make sure you have at least 4 GPUs
# actually 1 GPU is also OK, don't panic.
To modify the code to single GPU or other amount. Please go to prompt_feat/cmds/gqa/*.sh
, and modify CUDA_VISIBLE_DEVICES
, --nproc_per_node
and TEST.IMS_PER_BATCH
correspondingly.
CPT Inference
To inference:
cd Oscar
bash cmds/gqa/cpt_fsl.sh
bash cmds/gqa/pt_fsl.sh
We use the GPU:0,1,2,3
as default choice. If you want to modify the GPU ids, please go to cmds/gqa/cpt_fsl.sh and pt_fsl.sh
. You can also modify the program to single GPU without modifing the batchsize. The result is supposed to be similar because I set the gradient accumulation step to be the dataset size.
Evaluation
To evaluate, please run:
cd Oscar
bash eval/gqa/show.sh
3. VCR (Visual Commonsense Reasoning)
VCR is a multiple-choice QA dataset, including question->answer
, quesntion+answer->rationale
and question->answer+rationale
tasks.
Data
Please download the data first.
bash cmds/prepare_data/download_vcr.sh
Feature Extraction
To extract features:
cd prompt_feat
bash cmds/vcr/prepare.sh # make sure you have at least 4 GPUs
# actually 1 GPU is also OK, don't panic.
To modify the code to single GPU or other amount. Please go to prompt_feat/cmds/vcr/pt_vcr_val_seg and cpt_vcr_val_seg
, and modify CUDA_VISIBLE_DEVICES
, --nproc_per_node
and TEST.IMS_PER_BATCH
correspondingly.
CPT Inference
To inference:
export GPUID=0
# vcr_q_a
bash cmds/vcr/cpt_fsl.sh $GPUID vcr_q_a cpt
bash cmds/vcr/pt_fsl.sh $GPUID vcr_q_a pt
# vcr_qa_r
bash cmds/vcr/cpt_fsl.sh $GPUID vcr_qa_r cpt
bash cmds/vcr/pt_fsl.sh $GPUID vcr_qa_r pt
# vcr_qar
bash cmds/vcr/qar_cpt_fsl.sh $GPUID vcr_qar cpt
bash cmds/vcr/qar_pt_fsl.sh $GPUID vcr_qar pt
We use the GPU:0
as default choice. If you want to modify the GPU id, please modify the GPUID=0
to the GPU id you want.
Meanwhile, our implementation enables running all tasks simutaneously, by assigning different GPUIDs to different tasks.
Evaluation
To evaluate, please run:
cd Oscar
bash eval/vcr/show.sh
4. VG (Visual Genome)
VG is a visual relation detection dataset. The model should detect relational triplet in images.
Data
Please download the data first.
bash cmds/prepare_data/download_vg.sh
Feature Extraction
To extract features:
cd prompt_feat
bash cmds/vg/prepare.sh # make sure you have at least 4 GPUs
# actually 1 GPU is also OK, don't panic.
To modify the code to single GPU or other amount. Please go to prompt_feat/cmds/vg/_vg_val.sh and _vg_test.sh
, and modify CUDA_VISIBLE_DEVICES
, --nproc_per_node
and TEST.IMS_PER_BATCH
correspondingly.
CPT Inference
To inference:
cd Oscar
bash cmds/vg/cpt_run_all.sh
We use the GPU:0,1,2,3
as default choice. If you want to modify the GPU id, please go to Oscar/cmds/vg/_fsl.sh
to modify CUDA_VISIBLE_DEVICES
and --nproc_per_node
. Note that the --per_gpu_train_batch_size
multiply number of GPUs should be 40. Or the result will be different.
Evaluation
To evaluate, please run:
cd Oscar
bash eval/vg/eval_vg.py results/vg/cpt/
Bugs or questions?
If you have any questions related to the code or the paper, feel free to email Ao Zhang (zhanga6@outlook.com
). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
Acknowledgement
The code is built on scene_graph_benchmark and Oscar Thanks for their excellent codes.