Awesome
CelebA-Dialog Dataset
Talk-to-Edit: Fine-Grained Facial Editing via Dialog </br> Yuming Jiang*, Ziqi Huang*, Xingang Pan, Chen Change Loy and Ziwei Liu </br> In IEEE International Conference on Computer Vision (ICCV), 2021.
From MMLab@NTU affliated with S-Lab, Nanyang Technological University.
<img src="./assets/celeba_dialog.png" width="80%">[Project Page] | [Paper] | [Code] | [Video] | [Web Page]
CelebA-Dialog is a large-scale visual-language face dataset with the following features:
- Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
- Accompanied with each image, there are captions describing the attributes and a user request sample.
The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, natural language based facial recognition and manipulation, and broader multi-modality learning tasks. The dataset is proposed in Talk-to-Edit.
Download Links
You can download using the following links:
- "HQ" refers to images and corresponding annotations for the 30,000 high-resolutions images following CelebA-HQ.
- "standard" refers to images and corresponding annotations for original 202,599 CelebA images.
Link (HQ) | Size | Files | Format | Description |
---|---|---|---|---|
CelebA-Dialog (HQ) | ~4.4 GB | 30,000 high-resolution images and corresponding annotations | ||
├ image (HQ) | ~2.7 GB | 30,000 | JPG | images from CelebA-HQ |
├ fine-grained label (HQ) | ~600 KB | 1 | TXT | fine-grained labels for 5 attributes |
├ binary label (HQ) | ~3.5 MB | 1 | TXT | binary labels for 40 attributes |
├ text (HQ) | ~27 MB | 4 | TXT and JSON | natural language captions and editing requests |
├ mask (HQ) | ~1.8 GB | PNG | segmentation masks (1) binary (2) colorized | |
├ identity (HQ) | ~400 KB | 1 | TXT | identity label of each image |
Link (standard) | Size | Files | Format | Description |
---|---|---|---|---|
CelebA-Dialog (standard) | 202,599 original CelebA images and corresponding annotations | |||
├ image (standard) | images from CelebA | |||
├ fine-grained label (standard) | ~4 MB | 1 | TXT | fine-grained labels for 5 attributes |
├ binary label (standard) | ~25 MB | 1 | TXT | binary labels for 40 attributes |
├ text (standard) | ~14 MB | TXT and JSON | natural language captions and editing requests | |
├ identity (standard) | ~3.3 MB | 1 | TXT | identity label of each image |
Link (mapping) | Size | Files | Format | Description |
---|---|---|---|---|
HQ-to-standard mapping | ~1 MB | 1 | TXT | The mapping between 30,000 CelebA-HQ images and the 202,599 CelebA images |
Details
Image
- HQ:
- 30,000 face images selected from the CelebA dataset by following CelebA-HQ
- High resolution of 1024 x 1024
- standard:
- 202,599 face images from the CelebA dataset
Fine-Grained Label
- 5 fine-grained attributes annotations per image: <em>Bangs, Eyeglasses, Beard, Smiling, and Age</em>
Binary Label
- 40 binary attributes annotations per image
Text
- Textual captions for each image
- A user editing request per image
Mask
We preprocess the facial segmentation masks of CelebAMask-HQ to ease future research.
- You can directly download the binary masks for individual labels for each image. These are the same as the ones provided in CelebAMask-HQ. (Download link)
- We produce the combined colorized mask for each image following the parsing of CelebAMask-HQ. (Download link)
Below is the color-to-label parsing information:
Label list | ||||
---|---|---|---|---|
0: 'background' | 1: 'skin' | 2: 'nose' | 3: 'eye_g' | 4: 'l_eye' |
5: 'r_eye' | 6: 'l_brow' | 7: 'r_brow' | 8: 'l_ear' | 9: 'r_ear' |
10: 'mouth' | 11: 'u_lip' | 12: 'l_lip' | 13: 'hair' | 14: 'hat' |
15: 'ear_r' | 16: 'neck_l' | 17: 'neck' | 18: 'cloth' |
from PIL import Image
import numpy as np
segm = Image.open(f)
segm = np.array(segm) # shape: [512, 512]
Identity
Some images are of the same person. There are totally 10,177 identities in the dataset. On average, there are:
- around 20 images per identity in CelebA (standard)
- around 3 images per identity in CelebA-HQ
Agreement
- The CelebA-Dialog dataset is available for non-commercial research purposes only.
- You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
- You agree not to further copy, publish or distribute any portion of the CelebA-Dialog dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.
Citation
If you find this dataset useful for your research and use it in your work, please consider cite the following papers:
@InProceedings{CelebA-Dialog,
title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2021}
}
@inproceedings{CelebAMask-HQ,
title = {MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
author = {Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2020}
}
@inproceedings{CelebA-HQ,
title={Progressive Growing of {GAN}s for Improved Quality, Stability, and Variation},
author={Tero Karras and Timo Aila and Samuli Laine and Jaakko Lehtinen},
booktitle={International Conference on Learning Representations},
year={2018},
}
@inproceedings{CelebA,
title = {Deep Learning Face Attributes in the Wild},
author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}
}