Home

Awesome

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation (ACL 2021)

Yiran Xing*, Zai Shi*, Zhao Meng*, Gerhard Lakemeyer, Yunpu Ma, Roger Wattenhofer

∗The first three authors contribute equally to this work

[Paper] [Supplementary]

image

image

How to Cite Our Work

@inproceedings{KM-BART,
    title = "{KM}-{BART}: Knowledge Enhanced Multimodal {BART} for Visual Commonsense Generation",
    author = "Xing, Yiran  and
      Shi, Zai  and
      Meng, Zhao  and
      Lakemeyer, Gerhard  and
      Ma, Yunpu  and
      Wattenhofer, Roger",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "525--535"
}

Installation

  1. Clone the repository recursively

    git clone --recursive https://github.com/FomalhautB/KM-BART-ACL.git
    
  2. Create conda environment

    conda env create -f environment.yaml
    

The following steps are only required for feature extraction.

  1. Install bottom-up-attention.pytorch. Please refer to bottom-up-attention.pytorch, for more details.

    cd bottom-up-attention.pytorch
    # install detectron2
    cd detectron2
    pip install -e .
    cd ..
    # install the rest modules
    python setup.py build develop
    cd ..
    
  2. Install comet-commonsense. Please refer to comet-commonsense for more details.

    cd comet-commonsense
    # download data
    bash scripts/setup/get_atomic_data.sh
    bash scripts/setup/get_model_files.sh
    # install dependencies
    pip install tensorflow
    pip install ftfy==5.1
    conda install -c conda-forge spacy
    python -m spacy download en
    pip install tensorboardX
    pip install tqdm
    pip install pandas
    pip install ipython
    

Data Preparation

VCG

  1. Download the images from here and decompress the images into $VCR_DATASET
  2. Download the annotations from here and decompress the annotations into $VCG_ANNOTATION
  3. Extract features and save the features in $VCG_DATA:
    python -m scripts.prepare_vcg \
        --data_dir $VCR_DATASET \ 
        --output_dir $VCG_DATA \
        --annot_dir $VCG_ANNOTATION \
        --gpu_num 4
    

COCO

  1. Download the train images from here and decompress the images into $COCO_TRAIN
  2. Download the validation images from here and decompress the images into $COCO_VAL
  3. Download the annotations from here and decompress the annotations into $COCO_ANNOTATION
  4. Extract features and save the features in $COCO_DATA:
    python -m scripts.prepare_coco \
        --train_dir $COCO_TRAIN \
        --val_dir $COCO_VAL \
        --annot_dir $COCO_ANNOTATION  \
        --output_dir $COCO_DATA \
        --gpu_num 4
    

SBU and CC

  1. Download the json files for image urls and captions from here and Decompress the two files into $SBU_ANNOTATION
  2. extract the features, bounding box and labels, build image annotations and save into $OUTPUT_DATA (This will download the images first and save in $SBU_DATA):
    python -m scripts.prepare_sbu \
        --download \
        --data_dir $SBU_DATA \
        --output_dir $OUTPUT_DATA \
        --annot_dir $SBU_ANNOTATION \
        --gpu_num 4 \
        --n_jobs 8
    

VG

  1. Download the objects, relationships, region descriptions, attributs and image meta data from here and decompress them into $VG_ANNOTATION
  2. Download the images from the same link above and decompress them into $VG_IMAGES
    python -m scripts.prepare_vg \
        --annot_dir $VG_ANNOTATION \
        --output_dir $VG_DATA \
        --data_dir $VG_IMAGES \
        --gpu_num 4 \
    

Reasoning (SBU and COCO)

  1. Download the pretrained weight atomic_pretrained_model.pickle of COMET from comet-commonsense

    • Save it to $LOAD_PATH.
    • Follow the instructions in comet-commonsense to make the dataloader of COMET.
  2. Download the json files for image urls and captions from here and decompress the two files into $SBU_ANNOTATION.

  3. Download the SBU dataset and save the images in $SBU_DATA and decompress the features, bounding box and labels of images and save into $SBU_DATA.

  4. Generate inferences and save the inferences in $REASON_DATA.

    python -m scripts.prepare_sbu_reason \
         --output_dir $REASON_DATA \
         --annot_dir  $SBU_ANNOTATION \
         --model_file $LOAD_PATH/COMET \
         --gpu_num 2 \
         --sampling_algorithm topk-3
    
    # rename the output file
    mv $REASON_DATA/train.json $SBU_DATA/reason_train.json
    
  5. Filter the newly generated inferences with a KM-BART pretrained on VCG (also in $LOAD_PATH) and save the final results in $OUTPUT_DATA.

    python -m scripts.filter_reason  \
         --data_dir $SBU_DATA \
         --output_dir $OUTPUT_DATA \
         --checkpoint $LOAD_PATH/KM-BART
    

Training

Pretrain from scratch

Pretrain from facebook bart-base

Continue pretraining

Train VCG

Generate and evaluate VCG

Pretrained Weights