Awesome
Face Alignment at 3000fps
It is an implementation of Face Alignment at 3000fps via Local Binary Features, a paper on CVPR 2014
New!!! Refactoring the code
1. Add Validation in Training Step(error is the same to the paper)
add Validation after each stage, output will be like below, now we can see how our model is fit on the validation set, much easier for parameter tuning.(Following is the output when training Helen dataset),the trained model can be DOWNLOADED from HERE, see next section for how to use.
training stage: 0 of 6
train regression error: 836.521, mean error: 0.116767
Validation at stage: 0
Validation error: 29.7611, mean error: 0.095695
training stage: 1 of 6
train regression error: 533.649, mean error: 0.0744904
Validation at stage: 1
Validation error: 22.4096, mean error: 0.0720567
training stage: 2 of 6
train regression error: 386.907, mean error: 0.0540071
Validation at stage: 2
Validation error: 19.6509, mean error: 0.0631862
training stage: 3 of 6
train regression error: 297.455, mean error: 0.0415208
Validation at stage: 3
Validation error: 18.3068, mean error: 0.0588642
training stage: 4 of 6
train regression error: 247.068, mean error: 0.0344875
Validation at stage: 4
Validation error: 17.8339, mean error: 0.0573436
training stage: 5 of 6
train regression error: 207.739, mean error: 0.0289976
Validation at stage: 5
Validation error: 17.5538, mean error: 0.0564431
2. Change to Read Parameters from Configure File
See next section for details.
In pervious versions, when setting the parameters, we have to write hard code in main.cpp, now I changed it to read from a configure file without having to recompile the codes
3. Add Helen Dataset Example for Better Understanding
in the example
folder, there are some configure files. See next section for details.
Interpret the Paper's details
If you are a Chinese, you can go to my blog for more details. link
License
If you use my work, please cite my name (Binbin Xu), Thanks in advance. This project is released under the BSD 2-Clause license.
How To Use
Requirements:
- OpenCV(I just use the basic structures of OpenCV, like cv::Mat, cv::Point)
- cmake
Prepare:
- Download helen dataset from this link: Helen Dataset
- unzip the helen dataset and put it under
example
folder, the path will be likeexample/helen/trainset/***.jpg
for an image
Training
mkdir build
cd build
cmake ..
make
./application train ../example/helen_train_config.txt
the training starts, you just need to see the output
Testing
- testing images that have ground truth shapes
take helen testset for example.
./application test ../example/helen_test_config_images_with_ground_truth.txt
- testing images that do not have ground truth shapes
take helen testset for example, assuming that we do not known the landmarks' annotations.
./application test ../example/helen_test_config_images_without_ground_truth.txt
Using My Simple Model
if you do not want to train a model, I provide one. You can download a model trained on helen/trainset
images from HERE, test error is 0.0564244
on helen/testset
- download
helen_trained_model.zip
and unzip it toexample/
path will be example/helen_trained_model/helenModel_params.txt
for helenModel_params.txt
- in
example/helen_test_config_images_with_ground_truth.txt
change first line helenModel
to ../example/helen_trained_model/helenModel
- run the following command
./application test ../example/helen_test_config_images_with_ground_truth.txt
- actually, you can put the model wherever you want, just change the path accordingly in
configure file
Configure Files Explanation:
1. Image List
in example/
there is a file named helen_test_images_list_with_ground_truth.txt
, the content is like this:
2815405614_1.jpg 2815405614_1.pts
114530171_1.jpg 114530171_1.pts
2437904540_1.jpg 2437904540_1.pts
...
each line contains a pair of names, the first one is the image name, the second is the ground truth file name
in example/
there is a file named helen_test_images_list_without_ground_truth.txt
, the content is like this:
2815405614_1.jpg
114530171_1.jpg
2437904540_1.jpg
...
after we trained a new model, we want to test new images, so just put each image's name in the image_list.txt
2. Training Configure File
helenModel // the model name, name what you like
200 // params.local_features_num_ = 200;
68 // params.landmarks_num_per_face_ = 68;
6 // params.regressor_stages_ = 6;
5 // params.tree_depth_ = 5;
12 // params.trees_num_per_forest_ = 12;
3 // params.initial_guess_ = 5;
0.3 // params.overlap_ = 0.3;
0.29 // first stage lacal radius when samples the pixel difference feature positions
0.21 // second stage local radius
0.16
0.12
0.08
0.04 // sixth stage local radius
2 // number of datasets for training
../example/helen/trainset/ // root folder that contains the *.jpgs and *.pts of the first dataset
../example/helen_train_images_list.txt // images list of the first dataset
../example/otherdataset/ // root folder of the second dataset
../example/otherdatset_train_images_list.txt // images list of the second dataset
1 // number of datasets for validation, 0 means none
../example/helen/testset/ // root folder that contains the *.jpgs and *.pts
../example/helen_test_images_list_with_ground_truth.txt // images list
off course, you can set number of datasets for validation as 0
, which means no validaion set.
When training, this configure file will be parsed, and images will be loaded. You can refer to main.cpp
for details how the configure file is parsed.
the parameters setting above is just an example, you have to fine tune the parameters in training for your dataset.
off course, you can still set the parameters like this directly in main.cpp
.
Parameters params;
params.local_features_num_ = 300;
params.landmarks_num_per_face_ = 68;
params.regressor_stages_ = 6;
params.local_radius_by_stage_.push_back(0.4);
params.local_radius_by_stage_.push_back(0.3);
params.local_radius_by_stage_.push_back(0.2);
params.local_radius_by_stage_.push_back(0.1);
params.local_radius_by_stage_.push_back(0.08);
params.local_radius_by_stage_.push_back(0.05);
params.tree_depth_ = 5;
params.trees_num_per_forest_ = 12;
params.initial_guess_ = 5;
3. Test Configure File
helen_test_config_images_with_ground_truth.txt
helenModel // model name that we want to use after we have trained
1 // 1 means we know the ground_truth_shapes of the images
1 // number of datasets we want to be tested
../example/helen/testset/ // root folder for testset
../example/helen_test_images_list_with_ground_truth.txt // image list
helen_test_config_images_without_ground_truth.txt
helenModel // model name that we want to use after we have trained
0 // 0 means we do not know the ground_truth_shapes of the images
1 // number of datasets we want to be tested
../example/helen/testset/ // root folder for testset
../example/helen_test_images_list_without_ground_truth.txt // image list
Notes
1. overlap
there is a parameter in randomforest.cpp
, line 95:
double overlap = 0.3; // you can set it to 0.4, 0.25 etc
each tree in the forest will be constructed using about N*(1-overlap+overlap/T)
examples, where N
is the total number of images after augmentation(if your train data set size is 2000, and initial_guess_
is 5, then N = 2000*(5+1)=12000 images), T
is the number of trees in each forest.
2. what are .pts files
here you can download dataset with .pts files, each .pts file contains 68 landmarks positions of each face
3. what are BoundingBox
it is just the bounding box of a face, including the center point of the box, you can just use the face rectangle detected by opencv alogrithm with a little effort calculating the center point's position yourself. Example codes are like below
BoundingBox bbox;
bbox.start_x = faceRec.x; // faceRec is a cv::Rect, containing the rectangle of the face
bbox.start_y = faceRec.y;
bbox.width = faceRec.width;
bbox.height = faceRec.height;
bbox.center_x = bbox.start_x + bbox.width / 2.0;
bbox.center_y = bbox.start_y + bbox.height / 2.0;
bboxes.push_back(bbox);
4. resize the image
If you try to resize the images, please use the codes below
if (image.cols > 2000){
cv::resize(image, image, cv::Size(image.cols / 3, image.rows / 3), 0, 0, cv::INTER_LINEAR);
ground_truth_shape /= 3.0;
}
DO NOT SWAP "image.cols" and "image.rows", since "image.cols" is the width of the image, the following lines are WRONG!!!
if (image.cols > 2000){
cv::resize(image, image, cv::Size(image.rows / 3, image.cols / 3), 0, 0, cv::INTER_LINEAR);
ground_truth_shape /= 3.0;
}
5. Customize
Re-write LoadGroundTruthShape
and LoadImages
in utils.cpp
for your own needs since different dataset may have different data format.
Results:
1. Detect The Face
2. Use The Mean Shape As The Initial Shape:
3. Predict The Landmarks
4. Bad Case
when you test new images that do not have ground truth annotations, you may encounter problems like this, actually it is NOT Face Alignment's problem, for the code just uses OpenCV's default face detector, and I choose the first bounding box rectangle returned by OpenCV(line 345 in utils.cpp
: cv::Rect faceRec = faces[0];
), you can use all the bounding box by re-writing the function. And sometimes the face detector may fail to detect a face in the image.
More
- The paper claims for 3000fps for very high frame rates for different parameters, while my implementation can achieve several hundreds frame rates. What you should be AWARE of is that we both just CALCULATE the time that predicting the landmarks, EXCLUDES the time that detecting faces.
- If you want to use it for realtime videos, using OpenCV's face detector will achieve about 15fps, since 80% (even more time is used to get the bounding boxes of the faces in an image), so the bottleneck is the speed of face detection, not the speed of landmarks predicting. You are required to find a fast face detector(For example, libfacedetection)
- In my project, I use the opencv face detector, you can change to what you like as long as using the same face detector in training and testing
- it can both run under Windows(use 64bits for large datasets, 32bits may encounter memory problem) and Unix-like(preferred) systems.
- it can reach 100~200 fps(even 300fps+, depending on the model complexity) when predicting 68 landmarks on a single i7 core with the model 5 or 6 layers deep. The speed will be much faster when you reduce 68 landmarks to 29, since it uses less(for example, only 1/4 in Global Regression, if you fix the random forest parameteres) parameters.
- for a 68 landmarks model, the trained model file(storing all the parameters) will be around 150M, while it is 40M for a 29 landmarks model.
- the results of the model is acceptable for me, deeper and larger random forest(you can change parameters like tree_depth, trees_num_per_forest_ and so on) will lead to better results, but with lower speed.
ToDo
- data augmentation like
flip
androtate
the images. - go on training after loading a trained model(currently we have to re-run the previous stages when we want to train a deeper model with the same parameter setting as a already trained model)
- I have already develop the multithread one, but the time for predicting one image is slower than sequential one, since creating and destroying threads cost more time, I will optimize it and update it later.
- I will also develop a version on GPU, and will also upload later.
THANKS
Many thanks goes to those appreciate my work.
if you have any question, contact me at declanxu@gmail.com or declanxu@126.com, THANKS.