Awesome
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild
Introduction
This is the code of paper Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild. We propose a novel facial landmark detector, PIPNet, that is fast, accurate, and robust. PIPNet can be trained under two settings: (1) supervised learning; (2) generalizable semi-supervised learning (GSSL). With GSSL, PIPNet has better cross-domain generalization performance by utilizing massive amounts of unlabeled data across domains.
<img src="images/speed.png" alt="speed" width="640px"> Figure 1. Comparison to existing methods on speed-accuracy tradeoff, tested on WFLW full test set (closer to bottom-right corner is better).<br><br> <img src="images/detection_heads.png" alt="det_heads" width="512px"> Figure 2. Comparison of different detection heads.<br>Installation
- Install Python3 and PyTorch >= v1.1
- Clone this repository.
git clone https://github.com/jhb86253817/PIPNet.git
- Install the dependencies in requirements.txt.
pip install -r requirements.txt
Demo
- We use a modified version of FaceBoxes as the face detector, so go to folder
FaceBoxesV2/utils
, runsh make.sh
to build for NMS. - Back to folder
PIPNet
, create two empty folderslogs
andsnapshots
. For PIPNets, you can download our trained models from here, and put them under foldersnapshots/DATA_NAME/EXPERIMENT_NAME/
. - Edit
run_demo.sh
to choose the config file and input source you want, then runsh run_demo.sh
. We support image, video, and camera as the input. Some sample predictions can be seen as follows.
-
PIPNet-ResNet18 trained on WFLW, with image
<img src="images/1_out_WFLW_model.jpg" alt="1_out_WFLW_model" width="400px">images/1.jpg
as the input: -
PIPNet-ResNet18 trained on WFLW, with a snippet from Shaolin Soccer as the input:
<img src="videos/shaolin_soccer.gif" alt="shaolin_soccer" width="400px"> -
PIPNet-ResNet18 trained on WFLW, with video
<img src="videos/002_out_WFLW_model.gif" alt="002_out_WFLW_model" width="512px">videos/002.avi
as the input: -
PIPNet-ResNet18 trained on 300W+CelebA (GSSL), with video
<img src="videos/007_out_300W_CELEBA_model.gif" alt="007_out_300W_CELEBA_model" width="512px">videos/007.avi
as the input:
Training
Supervised Learning
Datasets: 300W, COFW, WFLW, AFLW, LaPa
- Download the datasets from official sources, then put them under folder
data
. The folder structure should look like this:
PIPNet
-- FaceBoxesV2
-- lib
-- experiments
-- logs
-- snapshots
-- data
|-- data_300W
|-- afw
|-- helen
|-- ibug
|-- lfpw
|-- COFW
|-- COFW_train_color.mat
|-- COFW_test_color.mat
|-- WFLW
|-- WFLW_images
|-- WFLW_annotations
|-- AFLW
|-- flickr
|-- AFLWinfo_release.mat
|-- LaPa
|-- train
|-- val
|-- test
- Go to folder
lib
, preprocess a dataset by runningpython preprocess.py DATA_NAME
. For example, to process 300W:
python preprocess.py data_300W
- Back to folder
PIPNet
, editrun_train.sh
to choose the config file you want. Then, train the model by running:
sh run_train.sh
Generalizable Semi-supervised Learning
Datasets:
- data_300W_COFW_WFLW: 300W + COFW-68 (unlabeled) + WFLW-68 (unlabeled)
- data_300W_CELEBA: 300W + CelebA (unlabeled)
- Download 300W, COFW, and WFLW as in the supervised learning setting. Download annotations of COFW-68 test from here. For 300W+CelebA, you also need to download the in-the-wild CelebA images from here, and the face bounding boxes detected by us. The folder structure should look like this:
PIPNet
-- FaceBoxesV2
-- lib
-- experiments
-- logs
-- snapshots
-- data
|-- data_300W
|-- afw
|-- helen
|-- ibug
|-- lfpw
|-- COFW
|-- COFW_train_color.mat
|-- COFW_test_color.mat
|-- WFLW
|-- WFLW_images
|-- WFLW_annotations
|-- data_300W_COFW_WFLW
|-- cofw68_test_annotations
|-- cofw68_test_bboxes.mat
|-- CELEBA
|-- img_celeba
|-- celeba_bboxes.txt
|-- data_300W_CELEBA
|-- cofw68_test_annotations
|-- cofw68_test_bboxes.mat
- Go to folder
lib
, preprocess a dataset by runningpython preprocess_gssl.py DATA_NAME
. To process data_300W_COFW_WFLW, run
To process data_300W_CELEBA, runpython preprocess_gssl.py data_300W_COFW_WFLW
andpython preprocess_gssl.py CELEBA
python preprocess_gssl.py data_300W_CELEBA
- Back to folder
PIPNet
, editrun_train.sh
to choose the config file you want. Then, train the model by running:
sh run_train.sh
Evaluation
- Edit
run_test.sh
to choose the config file you want. Then, test the model by running:
sh run_test.sh
Community
- lite.ai.toolkit: Provide MNN C++, NCNN C++, TNN C++ and ONNXRuntime C++ version of PIPNet.
- torchlm: Provide a PyTorch re-implement of PIPNet with ONNX Export, can install with pip.
Citation
@article{JLS21,
title={Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild},
author={Haibo Jin and Shengcai Liao and Ling Shao},
journal={International Journal of Computer Vision},
publisher={Springer Science and Business Media LLC},
ISSN={1573-1405},
url={http://dx.doi.org/10.1007/s11263-021-01521-4},
DOI={10.1007/s11263-021-01521-4},
year={2021},
month={Sep}
}
Acknowledgement
We thank the following great works: