Awesome
CIIC
Implementation of Show, Deconfound and Tell: Image Captioning with Causal Inference (Updating)
Requirements (Our Main Enviroment)
- Python 3.7.4
- PyTorch 1.5.1
- TorchVision 0.6.0
- coco-caption
- numpy
- tqdm
- yacs
- lmdbdict
Preparation
1. Download Bottom-up features. Prepare the training dataset as in https://github.com/ruotianluo/self-critical.pytorch
2. Download our features. https://pan.baidu.com/s/17HtCIGWBKlzjMlIY-x1tVQ [key]:687n
Training
*Note: our repository is mainly based on [https://github.com/ruotianluo/self-critical.pytorch).
1. Training the model
# for training
python train.py --id exp --caption_model CIIC --input_json data/cocotalk.json --input_label_h5 data/cocotalk_label.h5 --input_att_dir data/cocobu_att --input_att_dir_iod data/IOD --glove_embedding_dict data/Glove_embedding.npy --visual_dict data/vis.npy --lin_dict data/lin.npy --batch_size 10 --N_enc 6
--N_dec 6 --d_model 512 --d_ff 2048 --num_att_heads 8 --dropout 0.1 --learning_rate 0.0003 --learning_rate_decay_start 3 --learning_rate_decay_rate 0.5 --noamopt_warmup 20000 --self_critical_after 30
2. Evaluating the model
# for evaluating
python eval.py --model checkpoint_path/model-best.pth --infos_path checkpoint_path/infos-best.pkl
Acknowledgements
This code is implemented based on Ruotian Luo's implementation of image captioning in https://github.com/ruotianluo/self-critical.pytorch.