Home

Awesome

Image Captioning IPR Protection

Pattern Recognition | ArXiv

Official implementation of the paper: "Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection"

Published at Pattern Recognition, Elsevier

(Released on August 2021)

Updated on 04 October 2022

Updates

  1. Fix bugs
  2. Our framework on GAN IP protection is accepted in CVPR 2021, see here.
  3. Our framework on DNN IP protection is accepted in TPAMI 2022, see here.
  4. Our framework on RNN IP protection is accepted in AACL IJCNLP 2022, see here.

Description

<p align="justify"> By and large, existing Intellectual Property (IP) protection on deep neural networks typically i) focus on image classification task only, and ii) follow a standard digital watermarking framework that was conventionally used to protect the ownership of multimedia and video content. This paper demonstrates that the current digital watermarking framework is insufficient to protect image captioning tasks that are often regarded as one of the frontiers AI problems. As a remedy, this paper studies and proposes two different embedding schemes in the hidden memory state of a recurrent neural network to protect the image captioning model. From empirical points, we prove that a forged key will yield an unusable image captioning model, defeating the purpose of infringement. To the best of our knowledge, this work is the first to propose ownership protection on image captioning task. Also, extensive experiments show that the proposed method does not compromise the original image captioning performance on all common captioning metrics on Flickr30k and MS-COCO datasets, and at the same time it is able to withstand both removal and ambiguity attacks.</p> <p align="center"> <img src="pr2021a.png" width="50%"> </p> <p align="center"> Figure 1: An overview of our approach. (a) The original LSTM Cell and (b) LSTM Cell with key embedding operation</p>

https://user-images.githubusercontent.com/23725126/137418947-db800aef-8565-4f9f-bac8-9ffa0bc42d8f.mp4

Preparation

Dataset

Pretrained ResNet50

How to run

For compability issue, a docker image has been created and pushed to docker hub. Run the command to download it:

docker pull limjianhan/tensorflow:1.13.1-gpu

Two folders in this repo:

addition_bi is the implementation of element-wise addition model

multiplication_bi is the implementation of element-wise multiplication model

To start the docker

Refer to the scripts/run_docker.sh in the respective folders. Please set the absolute path to the CODE_DIR, CNN_DIR, DATA_DIR and COCO_EVAL_DIR. Copy the command and paste in terminal to start the docker.

To train the model

Refer to the scripts/training_steps.txt in the respective folders. Copy the command and paste in terminal to train the model for MSCOCO, Flickr30k or Flickr8k dataset. The evaluation will be run automatically after the training is complete. The result is saved to tmp folder.

To attack the model

Refer to the scripts/attack_key_steps.txt to attack the model with forged key.

Refer to the scripts/attack_sign_steps.txt to attack the model with fake signature.

Citation

If you find this work useful for your research, please cite

@article{IcIPR,
  author    = {Jian Han Lim and
               Chee Seng Chan and
               Kam Woh Ng and
               Fixin Fan and
               Qiang Yang},
  title     = {Protect, show, attend and tell: Empowering image captioning models with ownership protection},
  journal   = {Pattern Recognit.},
  year      = {2021},
  url       = {https://doi.org/10.1016/j.patcog.2021.108285},
  doi       = {10.1016/j.patcog.2021.108285},
}

Feedback

Suggestions and opinions on this work (both positive and negative) are greatly welcomed. Please contact the authors by sending an email to jianhanl98 at gmail.com or cs.chan at um.edu.my.

References

License and Copyright

The project is open source under BSD-3 license (see the LICENSE file).

©2021 Universiti Malaya and WeBank.