Home

Awesome

Changes in Segmentation Label to generate segmentation mask for DeepFashion_Try_On

0             -> 0
1,2           -> 1
5, 6, 7       -> 4
18            -> 5
19            -> 6
3, 8, 10, 11  -> 7
9, 12         -> 8
16            -> 9
17            -> 10
14            -> 11
4, 13         -> 12
15            -> 13

Tested only for LIP Dataset

<h1> Original Readme </h1> <br> <h2> Self Correction for Human Parsing </h2>

Python 3.6 License: MIT

An out-of-box human parsing representation extractor.

Our solution ranks 1st for all human parsing tracks (including single, multiple and video) in the third LIP challenge!

lip-visualization

Features:

Requirements

conda env create -f environment.yaml
conda activate schp
pip install -r requirements.txt

Simple Out-of-Box Extractor

The easiest way to get started is to use our trained SCHP models on your own images to extract human parsing representations. Here we provided state-of-the-art trained models on three popular datasets. Theses three datasets have different label system, you can choose the best one to fit on your own task.

LIP (exp-schp-201908261155-lip.pth)

ATR (exp-schp-201908301523-atr.pth)

Pascal-Person-Part (exp-schp-201908270938-pascal-person-part.pth)

Choose one and have fun on your own task!

To extract the human parsing representation, simply put your own image in the INPUT_PATH folder, then download a pretrained model and run the following command. The output images with the same file name will be saved in OUTPUT_PATH

python simple_extractor.py --dataset [DATASET] --model-restore [CHECKPOINT_PATH] --input-dir [INPUT_PATH] --output-dir [OUTPUT_PATH]

[Updated] Here is also a colab demo example for quick inference provided by @levindabhi.

The DATASET command has three options, including 'lip', 'atr' and 'pascal'. Note each pixel in the output images denotes the predicted label number. The output images have the same size as the input ones. To better visualization, we put a palette with the output images. We suggest you to read the image with PIL.

If you need not only the final parsing images, but also the feature map representations. Add --logits command to save the output feature maps. These feature maps are the logits before softmax layer.

Dataset Preparation

Please download the LIP dataset following the below structure.

data/LIP
|--- train_imgaes # 30462 training single person images
|--- val_images # 10000 validation single person images
|--- train_segmentations # 30462 training annotations
|--- val_segmentations # 10000 training annotations
|--- train_id.txt # training image list
|--- val_id.txt # validation image list

Training

python train.py 

By default, the trained model will be saved in ./log directory. Please read the arguments for more details.

Evaluation

python evaluate.py --model-restore [CHECKPOINT_PATH]

CHECKPOINT_PATH should be the path of trained model.

Extension on Multiple Human Parsing

Please read MultipleHumanParsing.md for more details.

Citation

Please cite our work if you find this repo useful in your research.

@article{li2020self,
  title={Self-Correction for Human Parsing}, 
  author={Li, Peike and Xu, Yunqiu and Wei, Yunchao and Yang, Yi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  year={2020},
  doi={10.1109/TPAMI.2020.3048039}}

Visualization

Related

Our code adopts the InplaceSyncBN to save gpu memory cost.

There is also a PaddlePaddle Implementation of this project.