Home

Awesome

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

<a href='https://cococozibojia.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/pdf/2403.12035'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>

Bojia Zi<sup>1</sup>, Shihao Zhao<sup>2</sup>, Xianbiao Qi<sup>*5</sup>, Jianan Wang<sup>4</sup>, Yukai Shi<sup>3</sup>, Qianyu Chen<sup>1</sup>, Bin Liang<sup>1</sup>, Rong Xiao<sup>5</sup>, Kam-Fai Wong<sup>1</sup>, Lei Zhang<sup>4</sup>

* is corresponding author.

This is the inference code for our paper CoCoCo.

<p align="center"> <img src="https://github.com/zibojia/COCOCO/blob/main/__asset__/COCOCO.PNG" alt="COCOCO" style="width: 100%;"/> </p> <table> <tr> <td><img src="__asset__/sea_org.gif"></td> <td><img src="__asset__/sea1.gif"></td> <td><img src="__asset__/sea2.gif"></td> </tr> <tr> <td> Orginal </td> <td> The ocean, the waves ... </td> <td> The ocean, the waves ... </td> </tr> </table> <table> <tr> <td><img src="__asset__/river_org.gif"></td> <td><img src="__asset__/river1.gif"></td> <td><img src="__asset__/river2.gif"></td> </tr> <tr> <td> Orginal </td> <td> The river with ice ... </td> <td> The river with ice ... </td> </tr> </table> <table> <tr> <td><img src="__asset__/sky_org.gif"></td> <td><img src="__asset__/sky1.gif"></td> <td><img src="__asset__/sky2.gif"></td> </tr> <tr> <td> Orginal </td> <td> Meteor streaking in the sky ... </td> <td> Meteor streaking in the sky ... </td> </tr> </table>

Table of Contents <!-- omit in toc -->

Features

Installation

Step1. Installation Checklist

Before install the dependencies, you should check the following requirements to overcome the installation failure.

Step2. Install the requirements

If you update your enviroments successfully, then try to install the dependencies by pip.

# Install the CoCoCo dependencies
pip3 install -r requirements.txt
# Compile the SAM2
pip3 install -e .

If everything goes well, I think you can turn to the next steps.

Usage

1. Download pretrained models.

Note that our method requires both parameters of SD1.5 inpainting and cococo.

2. Prepare the mask

You can obtain mask by GroundingDINO or Track-Anything, or draw masks by yourself.

We release the gradio demo to use the SAM2 to implement Video Inpainting Anything. Try our Demo!

<p align="center"> <img src="https://github.com/zibojia/COCOCO/blob/main/__asset__/DEMO.PNG" alt="DEMO" style="width: 95%;"/> </p>

3. Run our validation script.

By running this code, you can simply get the video inpainting results.

python3 valid_code_release.py --config ./configs/code_release.yaml \
--prompt "Trees. Snow mountains. best quality." \
--negative_prompt "worst quality. bad quality." \
--guidance_scale 10 \ # the cfg number, higher means more powerful text controlability
--video_path ./images/ \ # the path that store the video and masks, the format is the images.npy and masks.npy
--model_path [cococo_folder_name] \ # the path to cococo weights, e.g. ./cococo_weights
--pretrain_model_path [sd_folder_name] \ # the path that store the pretrained stable inpainting model, e.g. ./stable-diffusion-v1-5-inpainting
--sub_folder unet # set the subfolder of pretrained stable inpainting model to get the unet checkpoints

4. Personalized Video Inpainting (Optional)

We give a method to allow users to compose their own personlized video inpainting model by using personalized T2Is WITHOUT TRAINING. There are three steps in total:

<table> <tr> <td><img src="__asset__/gibuli_lora_org.gif"></td> <td><img src="__asset__/gibuli_merged1.gif"></td> <td><img src="__asset__/gibuli_merged2.gif"></td> </tr> </table> <table> <tr> <td><img src="__asset__/unmbrella_org.gif"></td> <td><img src="__asset__/unmbrella1.gif"></td> <td><img src="__asset__/unmbrella2.gif"></td> </tr> </table> <table> <tr> <td><img src="__asset__/gibuli.gif"></td> <td><img src="__asset__/bocchi1.gif"></td> <td><img src="__asset__/bocchi2.gif"></td> </tr> </table>

Convert safetensors to Pytorch weights

Take Pytorch weights and add them on CoCoCo to create personalized video inpainting

5. COCOCO INFERENCE with SAM2

TO DO


[1]. We will use larger dataset with high-quality videos to produce a more powerful video inpainting model soon.

[2]. The training code is under preparation.

Citation


@article{Zi2024CoCoCo,
  title={CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility},
  author={Bojia Zi and Shihao Zhao and Xianbiao Qi and Jianan Wang and Yukai Shi and Qianyu Chen and Bin Liang and Kam-Fai Wong and Lei Zhang},
  journal={ArXiv},
  year={2024},
  volume={abs/2403.12035},
  url={https://arxiv.org/abs/2403.12035}
}

Acknowledgement

This code is based on AnimateDiff, Segment-Anything-2 and propainter.