Awesome

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

By Xiaoxiong Du, Jun Peng, Yiyi Zhou, Jinlu Zhang, Siting Chen, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji.

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

DEMO VIDEO

Introduction

This repository is pytorch implementation of PixelFace+. PixelFace+ utilizes both mask and text features for highly controllable face generation and manipulation. We propose the GCMF module to achieve better decoupling. Additionally, to enhance the alignment between generated images and text, we introduce a regularization loss function based on CLIP. The framework diagram of PixelFace+ is shown below: The Framework of PixelFace+

Citation

@inproceedings{10.1145/3581783.3612067,
author = {Du, Xiaoxiong and Peng, Jun and Zhou, Yiyi and Zhang, Jinlu and Chen, Siting and Jiang, Guannan and Sun, Xiaoshuai and Ji, Rongrong},
title = {PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks},
year = {2023},
isbn = {9798400701085},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3581783.3612067},
doi = {10.1145/3581783.3612067},
pages = {4666–4677},
numpages = {12},
keywords = {controllable face generation, face editing},
series = {MM '23}
}

Prerequisites

python 3.6
pytorch 1.10.0
pytorch-fid 0.2.1
torchvision 0.11.1

Data preparation

Multi-Modal-CelebA-HQ Dataset [Link]

Before training, please dowload the dataset2.json (which has been compressed as a zip file), and place the file in the MMceleba dataset directory.

Training

Preparing your settings. To train a model, you should modify code/cfg/mmceleba.yml to adjust the settings you want. The default configuration is to train on MMceleba with input and output image resolution set to 256*256, and BatchSize set to 4. Increasing the BatchSize may result in a decrease in semantic alignment after training, as a larger BatchSize reduces the constraint of the CLIP regularization loss.
Training the model. run train.py under the main folder to start training:

cd /PixelFace+/code
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node  2 --master_port 10011  main.py --cfg cfg/mmceleba.yml

Testing the model. After training for more than 70 epochs, the model automatically evaluates its performance every ten epochs. If you need to modify the evaluation frequency, you can do so at line 675 in \code\trainer.py.

Testing

You can use the eval1 method(which at line 732 of \code\trainer.py) to generate iamges.

If you want to generate an image from your own description, you may can try to put the code of sample.py to \code\trainer.py.

Pretrain Model

Dowload the pretrain model. The Model link: https://pan.baidu.com/s/1ARSjz6IXCO2-8qf1Tf9p-A?pwd=qwer, the file extraction code:qwer.
Modify the cfg file\code\cfg\mmceleba.yml to use the pretrain model:

TRAIN:
  FLAG: True

  ##### Modify This Line #####
  NET_G: '/PATH/TO/PRETRAIN/MODEL'

  B_NET_D: True
  BATCH_SIZE: 4  
  MAX_EPOCH: 100
  SNAPSHOT_INTERVAL: 1  
  DISCRIMINATOR_LR: 0.004
  GENERATOR_LR: 0.002

Acknowledgement

Thanks for a lot of codes from PixelFolder and PixelFace.