Awesome

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation (ACL 2021)

Yiran Xing, Zai Shi, Zhao Meng*, Gerhard Lakemeyer, Yunpu Ma, Roger Wattenhofer

∗The first three authors contribute equally to this work

How to Cite Our Work

@inproceedings{KM-BART,
    title = "{KM}-{BART}: Knowledge Enhanced Multimodal {BART} for Visual Commonsense Generation",
    author = "Xing, Yiran  and
      Shi, Zai  and
      Meng, Zhao  and
      Lakemeyer, Gerhard  and
      Ma, Yunpu  and
      Wattenhofer, Roger",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "525--535"
}

Installation

Clone the repository recursively

git clone --recursive https://github.com/FomalhautB/KM-BART-ACL.git

Create conda environment
```
conda env create -f environment.yaml
```

The following steps are only required for feature extraction.

Install bottom-up-attention.pytorch. Please refer to bottom-up-attention.pytorch, for more details.

cd bottom-up-attention.pytorch
# install detectron2
cd detectron2
pip install -e .
cd ..
# install the rest modules
python setup.py build develop
cd ..

Install comet-commonsense. Please refer to comet-commonsense for more details.

cd comet-commonsense
# download data
bash scripts/setup/get_atomic_data.sh
bash scripts/setup/get_model_files.sh
# install dependencies
pip install tensorflow
pip install ftfy==5.1
conda install -c conda-forge spacy
python -m spacy download en
pip install tensorboardX
pip install tqdm
pip install pandas
pip install ipython

Data Preparation

VCG

Download the images from here and decompress the images into $VCR_DATASET
Download the annotations from here and decompress the annotations into $VCG_ANNOTATION

Extract features and save the features in $VCG_DATA:

python -m scripts.prepare_vcg \
    --data_dir $VCR_DATASET \ 
    --output_dir $VCG_DATA \
    --annot_dir $VCG_ANNOTATION \
    --gpu_num 4

COCO

Download the train images from here and decompress the images into $COCO_TRAIN
Download the validation images from here and decompress the images into $COCO_VAL
Download the annotations from here and decompress the annotations into $COCO_ANNOTATION

Extract features and save the features in $COCO_DATA:

python -m scripts.prepare_coco \
    --train_dir $COCO_TRAIN \
    --val_dir $COCO_VAL \
    --annot_dir $COCO_ANNOTATION  \
    --output_dir $COCO_DATA \
    --gpu_num 4

SBU and CC

Download the json files for image urls and captions from here and Decompress the two files into $SBU_ANNOTATION

extract the features, bounding box and labels, build image annotations and save into $OUTPUT_DATA (This will download the images first and save in $SBU_DATA):

python -m scripts.prepare_sbu \
    --download \
    --data_dir $SBU_DATA \
    --output_dir $OUTPUT_DATA \
    --annot_dir $SBU_ANNOTATION \
    --gpu_num 4 \
    --n_jobs 8

VG

Download the objects, relationships, region descriptions, attributs and image meta data from here and decompress them into $VG_ANNOTATION

Download the images from the same link above and decompress them into $VG_IMAGES

python -m scripts.prepare_vg \
    --annot_dir $VG_ANNOTATION \
    --output_dir $VG_DATA \
    --data_dir $VG_IMAGES \
    --gpu_num 4 \

Reasoning (SBU and COCO)

Download the pretrained weight atomic_pretrained_model.pickle of COMET from comet-commonsense
- Save it to $LOAD_PATH.
- Follow the instructions in comet-commonsense to make the dataloader of COMET.
Download the json files for image urls and captions from here and decompress the two files into $SBU_ANNOTATION.
Download the SBU dataset and save the images in $SBU_DATA and decompress the features, bounding box and labels of images and save into $SBU_DATA.

Generate inferences and save the inferences in $REASON_DATA.

python -m scripts.prepare_sbu_reason \
     --output_dir $REASON_DATA \
     --annot_dir  $SBU_ANNOTATION \
     --model_file $LOAD_PATH/COMET \
     --gpu_num 2 \
     --sampling_algorithm topk-3

# rename the output file
mv $REASON_DATA/train.json $SBU_DATA/reason_train.json

Filter the newly generated inferences with a KM-BART pretrained on VCG (also in $LOAD_PATH) and save the final results in $OUTPUT_DATA.

python -m scripts.filter_reason  \
     --data_dir $SBU_DATA \
     --output_dir $OUTPUT_DATA \
     --checkpoint $LOAD_PATH/KM-BART

Training

Pretrain from scratch

Example of pretraining on COCO + SBU with 1 GPU and 4 CPUs from scratch (no pretrained weights)

python pretrain \
    --dataset coco_train $COCO_DATA \
    --dataset coco_val $COCO_DATA \
    --dataset sbu_train $SBU_DATA \
    --checkpoint_dir $CHECKPOINT_DIR \
    --gpu_num 1 \
    --batch_size 32 \
    --master_port 12345 \
    --log_dir $LOG_DIR \
    --amp \
    --num_workers 4 \
    --model_config config/pretrain_base.json

Pretrain from facebook bart-base

Example of loading pretrained weights from facebook bart base and train on COCO

python pretrain \
    --dataset coco_train $COCO_DATA \
    --checkpoint_dir $CHECKPOINT_DIR \
    --model_config config/pretrain_base.json \
    --checkpoint facebook/bart-base

Continue pretraining

Example of loading pretrained weights from previous checkpoint and continue to train on COCO

python pretrain \
    --dataset coco_train $COCO_DATA \
    --checkpoint_dir $CHECKPOINT_DIR \
    --model_config config/pretrain_base.json \
    --checkpoint $CHECKPOINT \
    --continue_training

Train VCG

Example of loading weights from pretrained checkpoint and fine tune on VCG. Validation will of loss and score will be done at the end of each epoch

python vcg_train \
    --data_dir $VCG_DATA \
    --checkpoint_dir $CHECKPOINT_DIR \
    --validate_loss \
    --validate_score \
    --model_config config/vcg_base.json \
    --checkpoint $CHECKPOINT \

Generate and evaluate VCG

Example of generating sentences for VCG:

python vcg_generate \
    --data_dir $VCG_DATA \
    --checkpoint $CHECKPOINT \
    --output_file $GENERATED_FILE \

Example of evaluating the generated file for VCG validation set:

python vcg_eval \
    --generation $GENERATED_FILE \
    --reference $VCG_DATA/val_ref.json