Home

Awesome

Ctrl-CIC: Directing the Visual Narrative through User-Defined Highlights

Repository for "Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights" in ECCV 2024.

Setup

Finetune

Inference

Evaluation

The Ctrl-CIC captions can be evaluated as follows:

Pretrained Weights

The pretrained weights are avaliable at huggingface.

Demo

For interactive Ctrl-CIC demo, you can run python scripts/rctrl_inference.py which allows flexible selection of the highlights and image. A similar program is provided for p-ctrl, but the output is shown on the command line.

Acknowledgement

The dataset and data loading implementation is based on the code provided in WikiWeb2M.

Citation

@InProceedings{Mao_2024_ECCV,
    author    = {Mao, Shunqi and Zhang, Chaoyi and Su, Hang and Song, Hwanjun and Shalyminov, Igor and Cai, Weidong},
    title     = {Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights},
    booktitle = {Proceedings of the 18th European Conference on Computer Vision (ECCV)},
    year      = {2024}
}