Awesome

FFHQ-Text👸

Facial Attribute Textual Descriptions 📃 for Flickr-Faces-HQ Dataset (FFHQ) 👸.

📚Text-to-Image Datasets

Text-to-X Datasets

Dataset	Public	Categories	Images (Resolution)	Annotations	Attributes	Other Details
CUB-200-2011	√	200	11,788 (Unfixed)	10	Uncounted	BBox, Segmentation...
Oxford-102 Flowers	√	102	8,189 (Unfixed)	10	Uncounted	-
MS-COCO	√	91	120k (Unfixed)	5	Uncounted	BBox, Segmentation...

Text-to-Face Datasets

Dataset	Public	Categories	Images (Resolution)	Annotations	Attributes	Other Details
SCU-Text2face	×	1 (Mixed)	1,000 (256×256)	5	Uncounted	-
Text2FaceGAN	×	1 (Mixed)	10,000 (178×218)	6	40	-
Faces a la Carte	×	1 (Mixed)	202,599 (178×218)	up to 10	40	-
Multi-Modal-CelebA-HQ	√	1 (Mixed)	30,000 (512×512)	10	38	Mask, Sketches
FFHQ-Text	√	1 (Female)	760 (1024×1024)	9	162	BBox

🍀 Overview

FFHQ-Text is a small-scale face image dataset with large-scale facial attributes, designed for text-to-face generation&manipulation, text-guided facial image manipulation, and other vision-related tasks.

This dataset is an extension of the NVIDIA Flickr-Faces-HQ Dataset (FFHQ), which is the selected top 760 female FFHQ images that only contain one complete human face.

In this study, we explore terminology in the human facial to manually annotate the FFHQ-Text dataset, which is breakdown into the following 13 multi-valued facial element groups from coarse to fine:

Age (8 classes: 0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, over 60)
Gender information (Girl👧 or Woman👩)
Head pose (front, left👈, right👉, up👆 or down👇)
Eyebrows (9 patterns)
Eyes👀 (18 patterns, 10 colors)
Nose👃 (8 patterns)
Mouth👄 (13 patterns)
Ears👂 (5 patterns)
Sink (8 patterns, 8 color scale)
Face shape (7 patterns)
Hairstyle (21 patterns, 18 colors)
Accessory (7 patterns 🧢🧣🎓👑👒...)
Glasses type (glasses👓 or sunglasses🕶)

🎁 Download

Content	Size	Files	Format	Details
FFHQ-Text	-	1,524		Main Folder
├ Image	0.97 GB	760	PNG	Female images from FFHQ of size 1024×1024
├ Text	766 KB	760	TXT	9 descriptions for each selected facial image in FFHQ
├ Train	12 KB	1	PKL	Filenames of training images
├ Test	6 KB	1	PKL	Filenames of testing images
├ bounding_boxes	21 KB	1	TXT	Determine the location and orientation of each face
├ images	19 KB	1	TXT	Counts, paths and filenames of all facial images

✒ Bounding boxes for each face were extracted using the VGG Image Annotator (VIA) platform.

Please fill out the FFHQ-Text Dataset Request Form.

If it is not convenient to access Google, please contact me📧 directly with your real name, institution, and institution/organization email address. We will send you an email with the FFHQ-Text dataset within one week.

🎉 Awesome Repo

This is a survey on Text-to-Image generation & manipulation and Other Related Works.

I hope you can have a primary knowledge about this topic, or some information would be helpful to find some sparks in your research~

📚 Feedback

Please fill out the FFHQ-Text Dataset Feedback Form.

I would greatly value your thoughts, suggestions, concerns or problems.

📌License & Privacy

The dataset is made available under Creative Commons BY-NC-SA 4.0 license by Interaction Laboratory, Ritsumeikan University. You can use, redistribute, and adapt it for non-commercial purposes, as long as you (a) give appropriate credit by citing our paper, (b) indicate any changes that you've made, and (c) distribute any derivative works under the same license.

The individual images were published in Flickr by their respective authors under either Creative Commons BY 2.0, Creative Commons BY-NC 2.0, Public Domain Mark 1.0, Public Domain CC0 1.0, or U.S. Government Works license. All of these licenses allow free use, redistribution, and adaptation for non-commercial purposes. However, some of them require giving appropriate credit to the original author, as well as indicating any changes that were made to the images. The license and original author of each image are indicated in the metadata.

For other instructions, please see the privacy section of the original FFHQ dataset for more details.

🎯 Terms of Use

Use of the provided FFHQ-Text Dataset will be deemed and treated as the user agreeing to and accepting the following Terms of Use content:

The user understands that the use of Dataset is restricted to research purposes only. Any commercial or for-profit purposes are prohibited.
Use of the FFHQ-Text Dataset by the user for unauthorized distribution, direct sales, or commercial business is prohibited. The user is not granted any rights or license to use the images referenced in the Dataset for any purpose.
While the contents are provided after thorough confirmation, the FFHQ-Text Dataset makes no warranties regarding the usefulness, accuracy, legality, morality, timeliness, and appropriateness of the information acquisition.

⭐Citation

<p align=center>“A picture🖼 is worth a thousand words📜~ ”</p>

If you find this dataset helpful for your research, please cite it as below:


@inproceedings{zhou2021generative,
  title={Generative Adversarial Network for Text-to-Face Synthesis and Manipulation with Pretrained BERT Model},
  author={Zhou, Yutong and Shimada, Nobutaka},
  booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)},
  pages={01--08},
  year={2021}
}

@inproceedings{zhou2021generative,
  title={Generative Adversarial Network for Text-to-Face Synthesis and Manipulation},
  author={Zhou, Yutong},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={2940--2944},
  year={2021}
}

@inproceedings{karras2019style,
  title={A style-based generator architecture for generative adversarial networks},
  author={Karras, Tero and Laine, Samuli and Aila, Timo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4401--4410},
  year={2019}
}