Awesome

Localized Symbolic Knowledge Distillation for Visual Commonsense Models [Neurips 2023]

Repo for LSKD: Distilling localized (e.g. bounding boxes), visual commonsense knowledge to Visual Language Models with ChatGPT generated data and filtering.

[paper] [dataset]

lskd_example

The Localized Commonsense Knowledge (LCK) Dataset

Dataset with localized reasoning is provided here

>>> pprint(df.iloc[1])
image                                       VG_100K/2348412.jpg
source                             chatgpt_region_any_v4_people
split                                                     train
index                       chatgpt_region_any_v4_people-858297
region                                                      [4]
region_all                                            [0, 2, 4]
references    [{'name': '4', 'boxes': [[379.6391601562, 152....
question      [What, is, the, significance, of, the, gold, l...
answer        [The, gold, lion, on, the, woman's, shirt, in,...
rationale     [Lions, are, often, used, as, symbols, of, str...
prediction                                             0.940065

Distillation Results

lskd_example

Model Training and Evaluation

We use the Salesforce LAVIS repo to train and evaluate the knowledge distillation pipeline.

Installation

pip install -e .

Downstream Task Evaluation

You can download the BLIP2 + LSKD model [here]

To run the evaluation on localized datasets, adjust $CHECKPOINT_DIR and run the script:

bash run_scripts/blip2/eval/eval_unified_common_sense.sh

Critic Model for Data Filtering

We also release the critic model used to filter the irrelevant generated data. You can download the finetuned critic model [here]

Run the following command to run the finetuned critic model in distriubted setting. This saves the output json file in run.output_dir

torchrun --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip2/eval/laion/laion_sample_critic_ft_filtering.yaml \
  --options run.output_dir=output/BLIP2/laion_samples/filtering/critic_ft/

References

@inproceedings{Park2023LocalizedSK,
  title={Localized Symbolic Knowledge Distillation for Visual Commonsense Models},
  author={Jae Sung Park and Jack Hessel and Khyathi Raghavi Chandu and Paul Pu Liang and Ximing Lu and Peter West and Youngjae Yu and Qiuyuan Huang and Jianfeng Gao and Ali Farhadi and Yejin Choi},
  year={2023},
  url={https://api.semanticscholar.org/CorpusID:266149843}
}