Home

Awesome

Person Remover v2

Demo and Docker image on Replicate <a href="https://replicate.com/javirk/object-removal-partial-convolutions"><img src="https://replicate.com/javirk/object-removal-partial-convolutions/badge"></a>

Would you like to travel to a touristic spot and yet appear alone in the photos?

Person remover is a project that uses partial convolutions to remove people or other objects from photos. For partial convolutions, the code by naoto0804 has been adapted, whereas for segmentation models, either torch hub models or the code by MIT CSAIL Computer Vision, have been used.

This project is capable of removing objects in images and video.

Python 3.7 and Pytorch 1.7.0 a have been used in this project.

How does it work?

A model with partial convolutions has been trained to fill holes in images. These instructions will you train a model in your local machine. However, the training dataset that has been used for the model are not publicly available. This dataset consists of 14900, 256x256x3 images. The code handles the creation of a hole in the center of the images and learns how to fill it with the surrounding data.

Requisites

In order to use the program Python 3.7 and the libraries specified in requirements.txt should be installed.

Installation

Clone the repository

git clone https://github.com/javirk/Person-remover-partial-convolutions.git
Using MIT CSAIL models for segmentation

By default the segmentation model is deeplab, which is included in PyTorch by default. However, if you want to use MIT CSAIL models, you'll have to follow some steps. First pip install the library by

pip install git+https://github.com/CSAILVision/semantic-segmentation-pytorch.git@master

Then, download and save appropriate weights. Pretrained weights are available here. They should be renamed as "encoder/decoder_"+model name either the model is for the encoder or for the decoder. For example, to use ppm_deepsup as decoder and resnet50dilated as encoder, the following files should be present under ./detector/weights folder:

encoder_resnet50dilated.pth
decoder_ppm_deepsup.pth

Download the weights for partial convolutions from Google Drive and put them in ./inpainter/weights/.

To get results of images, run person_remover.py:

python person_remover.py -i /dir/of/input/images

This will use deeplab model by default. One can change the segmentation model with:

python person_remover.py -i /dir/of/input/images -dm mitcsail -e resnet50dilated -d ppm_deepsup

It is also possible to specify the type of object to remove (only people are chosen by default):

python person_remover.py -i /dir/to/input/images -ob person bicycle car

Which will remove people, bycicles and cars. Take into account that the objects to remove depend on the segmentation model, and are defined in their respective .names files in ./detector/ folder.

Training

Segmentation models are taken pretrained. For partial convolutions networks, the training has spanned 47 epochs or fine tuning with batch normalization layers in the decoder set to not trainable in a dataset of 14900 training and 100 test images using the default parameters. It is worth noticing that the training process is extremely sensitive, so the best results might not come in the first run.

Training with the default parameters (check image_inpainting.py for reference) is performed as follows:

python image_inpainting.py -train /dir/of/training/images -test /dir/of/test/images -mode train

This will restart training from the last saved model in inpainter/weights folder. To train from scratch you should use:

python image_inpainting.py -train /dir/of/training/images -test /dir/of/test/images -mode train -r False

Image removal

p2p_fill_3 p2p_fill_4 p2p_fill_5 p2p_fill_6 p2p_fill_7 p2p_fill_8 p2p_fill_9

Next steps

The quality of the filling heavily relies on the segmentator, a bigger mask that covers the whole person is better than leaving out small parts such as one hand or one foot because the network tends to use these pixels to inpaint the rest. Working on a way to expand the mask by pixel offsetting would very likely lead to better results.

Partial convolutions at times modify the lightning conditions of the image, which then results in big borders around the filled areas. A smoother mix between inpainted and original images would be a quantitative step towards visual reality.

With the available weights, big artifacts are present in some cases due to the training methodology. A retraining will be carried out to tackle this issue.

Author

License

This project is under Apache license. See LICENSE.md for more details.

Acknowledgments