Home

Awesome

Multi-Modal-CelebA-HQ

Paper Maintenance PR's Welcome Images 30000

Multi-Modal-CelebA-HQ (MM-CelebA-HQ) is a dataset containing 30,000 high-resolution face images selected from CelebA, following CelebA-HQ. Each image in the dataset is accompanied by a semantic mask, sketch, descriptive text, and an image with a transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms for a range of face generation and understanding tasks, including text-to-image generation, sketch-to-image generation, text-guided image editing, image captioning, and visual question answering. This dataset is introduced and employed in TediGAN.

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation.<br> Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu.<br> CVPR 2021. <br>

Updates :triangular_flag_on_post:

Data Generation

Description

Usage

This section outlines the process of generating the data for our task.

The scripts provided here are not restricted to the CelebA-HQ dataset and can be utilized to preprocess any dataset that includes attribute annotations, be it image, video, or 3D shape data. This flexibility enables the creation of custom datasets that meet specific requirements. For example, the create_caption.py script can be applied to generate diverse descriptions for each video by using video facial attributes (e.g., those provided by CelebV-HQ), leading to a text-video dataset, similar to CelebV-Text.

Text

Please download celeba-hq-attribute.txt (CelebAMask-HQ-attribute-anno.txt) and run the following script.

python create_caption.py

The generated textual descriptions can be found at ./celeba_caption.

Please fill out the form to request the processing script. If feasible, please send me a follow-up email after submitting the form to remind me.

Sketch

If Photoshop is available to you, please apply the Photocopy filter in Photoshop to extract edges. Photoshop allows batch processing so you don't have to mannually process each image. The Sobel operator is an lternative way to extract edges when Photoshop is unavailable or a simpler approach is preferred. This process preserves facial details but introduces excessive noise. The sketch-simplification model is applied to get edge maps resembling hand-drawn sketches.

The sketch simplification model requires torch==0.4.1 and torchvision==0.2.1.

python create_sketch.py

The generated sketches can be found at ./celeba_sketch.

Overview

image

Note: Upon request, the download links of raw data and annotations have been removed from this repo. Please redirect to their original site for the raw data.and email me for the post-processing scripts. The scripts for text and sketch generation have been added to the repository.

All data is hosted on Google Drive (not available).

PathSizeFilesFormatDescription
multi-modal-celeba~20 GB420,002Main folder
├  train347 KB1PKLfilenames of training images
├  test81 KB1PKLfilenames of test images
├  image2 GB30,000JPGimages from celeba-hq of size 512×512
├  text11 MB30,0000TXT10 descriptions of each image in celeba-hq
├  coeff115 MB29,437MAT3dmm coefficients of each image in celeba-hq
├  rendered834 MB29,437PNGrendered image of each image in celeba-hq of size 256×256

For 3DMM coefficients and rendered images of each image in the FFHQ dataset, please refer to cleaned-celebahq-ffhq.

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Please consider citing our paper if you use these pretrained models. Feel free to pull requests if you have any updates. Feel free to pull requests if you have any updates.

MethodFIDLPIPISDownload
AttnGAN125.980.512Google Drive
ControlGAN116.320.522Google Drive
DF-GAN137.600.581Google Drive
DM-GAN131.050.544Google Drive
TediGAN106.370.456Google Drive

The pretrained model of ManiGAN is here. The training scripts and pretrained models on faces of sketch-to-to-image and label-to-image can be found here. Those with problems accessing Google Drive can refer to an alternative link at Baidu Cloud (code: b273) for the dataset and pretrained models.

Related Works

Citation

If you find the dataset, processing scripts, and pretrained models useful for your research, please consider citing our paper:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021towards,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

If you use images and masks, please cite:

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

License

The use of this software is RESTRICTED to non-commercial research and educational purposes. The license is the same as in CelebAMask-HQ.