Awesome

Image Captioning IPR Protection

Pattern Recognition | ArXiv

Official implementation of the paper: "Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection"

Published at Pattern Recognition, Elsevier

(Released on August 2021)

Updated on 04 October 2022

Updates

Fix bugs
Our framework on GAN IP protection is accepted in CVPR 2021, see here.
Our framework on DNN IP protection is accepted in TPAMI 2022, see here.
Our framework on RNN IP protection is accepted in AACL IJCNLP 2022, see here.

Description

By and large, existing Intellectual Property (IP) protection on deep neural networks typically i) focus on image classification task only, and ii) follow a standard digital watermarking framework that was conventionally used to protect the ownership of multimedia and video content. This paper demonstrates that the current digital watermarking framework is insufficient to protect image captioning tasks that are often regarded as one of the frontiers AI problems. As a remedy, this paper studies and proposes two different embedding schemes in the hidden memory state of a recurrent neural network to protect the image captioning model. From empirical points, we prove that a forged key will yield an unusable image captioning model, defeating the purpose of infringement. To the best of our knowledge, this work is the first to propose ownership protection on image captioning task. Also, extensive experiments show that the proposed method does not compromise the original image captioning performance on all common captioning metrics on Flickr30k and MS-COCO datasets, and at the same time it is able to withstand both removal and ambiguity attacks. <img src="pr2021a.png" width="50%"> Figure 1: An overview of our approach. (a) The original LSTM Cell and (b) LSTM Cell with key embedding operation

https://user-images.githubusercontent.com/23725126/137418947-db800aef-8565-4f9f-bac8-9ffa0bc42d8f.mp4

Preparation

Dataset

MSCOCO: Download the COCO train2014 and val2014 data from here
Flickr30k: Download from here
Flickr8k: Download from here

Pretrained ResNet50

Follow this repo to download the pretrained ResNet50 net here to use it to initialize the CNN part.

How to run

For compability issue, a docker image has been created and pushed to docker hub. Run the command to download it:

docker pull limjianhan/tensorflow:1.13.1-gpu

Two folders in this repo:

addition_bi is the implementation of element-wise addition model

multiplication_bi is the implementation of element-wise multiplication model

To start the docker

Refer to the scripts/run_docker.sh in the respective folders. Please set the absolute path to the CODE_DIR, CNN_DIR, DATA_DIR and COCO_EVAL_DIR. Copy the command and paste in terminal to start the docker.

To train the model

Refer to the scripts/training_steps.txt in the respective folders. Copy the command and paste in terminal to train the model for MSCOCO, Flickr30k or Flickr8k dataset. The evaluation will be run automatically after the training is complete. The result is saved to tmp folder.

To attack the model

Refer to the scripts/attack_key_steps.txt to attack the model with forged key.

Refer to the scripts/attack_sign_steps.txt to attack the model with fake signature.

Citation

If you find this work useful for your research, please cite

@article{IcIPR,
  author    = {Jian Han Lim and
               Chee Seng Chan and
               Kam Woh Ng and
               Fixin Fan and
               Qiang Yang},
  title     = {Protect, show, attend and tell: Empowering image captioning models with ownership protection},
  journal   = {Pattern Recognit.},
  year      = {2021},
  url       = {https://doi.org/10.1016/j.patcog.2021.108285},
  doi       = {10.1016/j.patcog.2021.108285},
}

Feedback

Suggestions and opinions on this work (both positive and negative) are greatly welcomed. Please contact the authors by sending an email to jianhanl98 at gmail.com or cs.chan at um.edu.my.

References

The baseline implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015) was based on this repo
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML 2015.
Microsoft COCO dataset
Flickr30k dataset
Flickr8k dataset

License and Copyright

The project is open source under BSD-3 license (see the LICENSE file).