Home

Awesome

CLVQA

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)

[arXiv | Data & annotation(json/npy)]

<img src="./figures/gh_teaser.png" alt="CLVQA" style="zoom:67%;" />

Preparation

Installation

conda create -n mmclvqa python=3.8
conda activate mmclvqa

git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .

cd ..
pip install -r extra_requirements.txt

CLOVE Dataset and Annotation

We release the datasets and annotations in json format(link) and npy format(link). To use our code for training, please download the npy files.


Training

Symbolic Replay Model (SRM)

Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,

cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
 --cl_setting scene \
 --task_seq abcdef \
 --model_name distilgpt2 \
 --model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token  \
 --add_task_tokens \
 --n_train_epochs 15

# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token  \ --add_task_tokens \
--n_train_epochs 15

UniVQA

Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py). Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples = $1.5:1$, with task order $o \rightarrow a \rightarrow r \rightarrow l \rightarrow k \rightarrow s$:

ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_knowledge_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[k].npy,oarlks_REPLAY[a]_AT[k].npy,oarlks_REPLAY[r]_AT[k].npy,oarlks_REPLAY[l]_AT[k].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_scenetext_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[s].npy,oarlks_REPLAY[a]_AT[s].npy,oarlks_REPLAY[r]_AT[s].npy,oarlks_REPLAY[l]_AT[s].npy,oarlks_REPLAY[k]_AT[s].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 

Testing

One can follow generate_run_scripts.py to generate one stop training-and-testing. For testing only, please refer to eval_os.py. An testing example for function setting, 1.5x SRM replayed samples, task order oarlks.

python
>>> from eval_os import *
>>> stage_sweep(cl_setting='functional', setting_idx=1, abbr_seq='oarlks', device=0, model_name='unicl', save_dir='/Users/stan/exp/clvqa', val_exp='distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5', test_stand_alone=False, test_reg=False, print_acc=False)
{'textvqa_accuracy': {'a2a': 0.4177,
                      'a2k': 0.1967,
                      'a2l': 0.0563,
                      'a2o': 0.4037,
                      'a2r': 0.121,
                      'a2s': 0.1453,
                      'k2a': 0.3263,
                      'k2k': 0.6813,
                      'k2l': 0.6807,
                      'k2o': 0.295,
                      'k2r': 0.3167,
                      'k2s': 0.1501,
                      'l2a': 0.272,
                      'l2k': 0.1943,
                      'l2l': 0.7153,
                      'l2o': 0.2653,
                      'l2r': 0.307,
                      'l2s': 0.1408,
                      'o2a': 0.1013,
                      'o2k': 0.1063,
                      'o2l': 0.0197,
                      'o2o': 0.5997,
                      'o2r': 0.0823,
                      'o2s': 0.0962,
                      'r2a': 0.3713,
                      'r2k': 0.2073,
                      'r2l': 0.121,
                      'r2o': 0.4023,
                      'r2r': 0.3943,
                      'r2s': 0.1555,
                      's2a': 0.3083,
                      's2k': 0.6253,
                      's2l': 0.6733,
                      's2o': 0.2963,
                      's2r': 0.3037,
                      's2s': 0.5511}}
{'textvqa_accuracy': [('o2o', 0.5997),
                      ('o2a', 0.1013),
                      ('o2r', 0.0823),
                      ('o2l', 0.0197),
                      ('o2k', 0.1063),
                      ('o2s', 0.0962),
                      ('a2o', 0.4037),
                      ('a2a', 0.4177),
                      ('a2r', 0.121),
                      ('a2l', 0.0563),
                      ('a2k', 0.1967),
                      ('a2s', 0.1453),
                      ('r2o', 0.4023),
                      ('r2a', 0.3713),
                      ('r2r', 0.3943),
                      ('r2l', 0.121),
                      ('r2k', 0.2073),
                      ('r2s', 0.1555),
                      ('l2o', 0.2653),
                      ('l2a', 0.272),
                      ('l2r', 0.307),
                      ('l2l', 0.7153),
                      ('l2k', 0.1943),
                      ('l2s', 0.1408),
                      ('k2o', 0.295),
                      ('k2a', 0.3263),
                      ('k2r', 0.3167),
                      ('k2l', 0.6807),
                      ('k2k', 0.6813),
                      ('k2s', 0.1501),
                      ('s2o', 0.2963),
                      ('s2a', 0.3083),
                      ('s2r', 0.3037),
                      ('s2l', 0.6733),
                      ('s2k', 0.6253),
                      ('s2s', 0.5511)]}
==> textvqa_accuracy | Final acc: [0.2963, 0.3083, 0.3037, 0.6733, 0.6253, 0.5511], weight avg acc: 0.45966666666666667. 
==> textvqa_accuracy | Backward transfer: [-0.3034, -0.1094, -0.09059999999999996, -0.04200000000000004, -0.05600000000000005], weighted bwt: -0.12028000000000004 
==> textvqa_accuracy | Forgetting: [0.41900000000000004, 0.40700000000000003, 0.4116, 0.04200000000000004, 0.09000000000000008], weighted forgetting: 0.27392.

Misc

config


Cite Our Work

If you find our work helps, please cite our paper.

@article{Lei_symbolic_2023, 
title={Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task}, 
volume={37}, 
url={https://ojs.aaai.org/index.php/AAAI/article/view/25208}, 
DOI={10.1609/aaai.v37i1.25208}, 
number={1}, 
journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
author={Lei, Stan Weixian and Gao, Difei and Wu, Jay Zhangjie and Wang, Yuxuan and Liu, Wei and Zhang, Mengmi and Shou, Mike Zheng}, 
year={2023}, 
month={Jun.}, 
pages={1250-1259} 
}

Contact

For any questions, welcome to create an issue or email Stan (leiwx52@gmail.com).


Acknowledgement