Home

Awesome

Fashion-Text2Video

Text2Performer: Text-Driven Human Video Generation </br> Yuming Jiang, Shuai Yang, Tong Liang Koh, Wayne Wu, Chen Change Loy and Ziwei Liu </br> In International Conference on Computer Vision (ICCV), 2023.

From MMLab@NTU affliated with S-Lab, Nanyang Technological University and Shanghai AI Laboratory.

[Project Page] | [Paper] | [Code] | [Demo Video]

Fashion-Text2Video is a human video dataset with rich label and text annotations. It has the following properties:

  1. It contains 600 high-resolution human videos from Fashion Dataset.
  2. For each video, we have annotations for motions (labels and texts).
  3. For each video, we have text descriptions for clothing textures and shapes.

Fashion-Text2Video dataset can be applied to text-driven human video generation. The dataset is proposed in Text2Performer.

Download Links

You can download using the following links:

PathSizeFormatDescription
Fashion-Text2Video~18 GB-main folder
├  Video Frames17.27 GBPNGFrames from Fashion Dataset, resolution 512 x 256
├  Motion Texts253 KBTXTTexts for human motions
├  Motion Labels160 KBTXTLabels for human motions
├  App Texts668 KBJSONTexts for Human Appearance
├  Motion Caption Templates5 KBJSONMotion Caption Templates

Motion Texts

<start_frame_seg1> <end_frame_seg1> <text_seg1>
<start_frame_seg2> <end_frame_seg2> <text_seg2>
...

Motion Labels

<start_frame_seg1> <end_frame_seg1> <label_seg1>
<start_frame_seg2> <end_frame_seg2> <label_seg2>
...

You can also use the template to generate more diverse text descriptions.

Agreement

Citation

If you find this dataset useful for your research and use it in your work, please consider cite the following papers:

@inproceedings{jiang2023text2performer,
  title={Text2Performer: Text-Driven Human Video Generation},
  author={Jiang, Yuming and Yang, Shuai and Koh, Tong Liang and Wu, Wayne and Loy, Chen Change and Liu, Ziwei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}