Awesome

Official code for TIPS: Text-Induced Pose Synthesis.

Accepted in the European Conference on Computer Vision (ECCV) 2022.

teaser

Abstract

In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying process challenging to apply in real-world scenarios as the generation of the target image is the actual aim. In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. We divide the problem into three independent stages: (a) text to pose representation, (b) pose refinement, and (c) pose rendering. To the best of our knowledge, this is one of the first attempts to develop a text-based pose transfer framework where we also introduce a new dataset DF-PASS, by adding descriptive pose annotations for the images of the DeepFashion dataset. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.

Network Architecture

network_architecture

The pipeline is divided into three stages. In stage 1, we estimate the target pose keypoints from the corresponding text description embedding. In stage 2, we regressively refine the initial estimation of the facial keypoints and obtain the refined target pose keypoints. Finally, in stage 3, we render the target image by conditioning the pose transfer on the source image.

Generation Results

results

Keypoints-guided methods tend to produce structurally inaccurate results when the physical appearance of the target pose reference significantly differs from the condition image. This observation is more frequent for the out of distribution target poses than the within distribution target poses. On the other hand, the existing text-guided method occasionally misinterprets the target pose due to a limited set of basic poses used for pose representation. The proposed text-guided technique successfully addresses these issues while retaining the ability to generate visually decent results close to the keypoints-guided baseline.

Try the TIPS inference pipeline demo in Colab

:zap: Getting Started

Clone the project repository and install dependencies.

git clone https://github.com/prasunroy/tips.git
cd tips
mkdir datasets
pip install -r requirements.txt

Download the DF-PASS dataset from Google Drive and extract into datasets/DF-PASS directory.

tips
├───datasets
│   └───DF-PASS
│       ├───gaussian_heatmaps
│       ├───descriptions.csv
│       ├───encodings.csv
│       ├───test_img_keypoints.csv
│       ├───test_img_list.csv
│       ├───test_img_pairs.csv
│       ├───train_img_keypoints.csv
│       ├───train_img_list.csv
│       └───train_img_pairs.csv
└─── ...

:rocket: Running the demo locally

Download the pretrained checkpoints and test data from Google Drive and extract into tips/demo directory.

tips
├───demo
│   ├───checkpoints
│   │   ├───pose2pose_260500.pth
│   │   ├───refinenet_100.pth
│   │   └───text2pose_75000.pth
│   ├───data
│   │   ├───images
│   │   ├───descriptions.csv
│   │   ├───encodings.csv
│   │   ├───img_pairs_df2df.csv
│   │   ├───img_pairs_df2rw.csv
│   │   ├───keypoints.csv
│   │   └───FreeMono.ttf
│   └─── ...
└─── ...

Run the demo notebook from tips/demo directory.

cd demo
jupyter notebook TIPS_demo.ipynb

External Links

<h4> <a href="https://prasunroy.github.io/tips">Project</a>  •   <a href="http://arxiv.org/abs/2207.11718">arXiv</a>  •   <a href="https://drive.google.com/drive/folders/17cvo22Eh_Z_S6fb-J-c6qw97WH6UeIHo">DF-PASS Dataset</a>  •   <a href="https://drive.google.com/drive/folders/1DwEcAPeYkXUNQ_SBhSJpydaLBTjh3_ms">Pretrained Models</a>  •   <a href="https://colab.research.google.com/github/prasunroy/tips/blob/main/notebooks/TIPS_demo.ipynb">Colab Demo</a> </h4>

Citation

@inproceedings{roy2022tips,
  title     = {TIPS: Text-Induced Pose Synthesis},
  author    = {Roy, Prasun and Ghosh, Subhankar and Bhattacharya, Saumik and Pal, Umapada and Blumenstein, Michael},
  booktitle = {The European Conference on Computer Vision (ECCV)},
  month     = {October},
  year      = {2022}
}

Related Publications

[1] Multi-scale Attention Guided Pose Transfer (PR 2023).

[2] Scene Aware Person Image Generation through Global Contextual Conditioning (ICPR 2022).

[3] Text Guided Person Image Synthesis (CVPR 2019).

[4] Progressive Pose Attention Transfer for Person Image Generation (CVPR 2019).

[5] DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations (CVPR 2016).

License

Copyright 2022 by the authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

The DF-PASS dataset and the pretrained models are released under Creative Commons Attribution 4.0 International (CC BY 4.0) license.