Awesome

ARoFace: Alignment Robustness to Improve Low-quality Face Recognition

European Conference on Computer Vision (ECCV 2024), Arxive, HuggingFace

Saeed Ebrahimi★, Sahar Rahimi★, Ali Dabouei, Nasser Nasrabadi

★ Equal contribution

Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, e.g., atmospheric turbulence, resolution, etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignment Errors (FAE) in LQ images, we present a simple yet effective method that considers FAE as another quality factor that is tailored to FR. We seek to improve LQ FR by enhancing FR models' robustness to FAE. To this aim, we formalize the problem as a combination of differentiable spatial transformations and adversarial data augmentation in FR. We perturb the alignment of the training samples using a controllable spatial transformation and enrich the training with samples expressing FAE. We demonstrate the benefits of the proposed method by conducting evaluations on IJB-B, IJB-C, IJB-S (+4.3% Rank1), and TinyFace (+2.63%)

Demo Visual comparison of aligned (a) and alignment-perturbed (b) samples from the IJB-B dataset. (c, d, e) The performance difference between aligned inputs and those with slight FAE. Models exhibit robustness to FAE in HQ samples but suffer significant performance drops in LQ faces, with over 50% reduction in TAR@FAR=1e-5. Results from two distinct ResNet-100 trained on MS1MV3 using ArcFace/AdaFace objective.

We introduce Face Alignment Error (FAE) as an image degradation factor tailored for FR which has previously been ignored in LQ FR studies.
We propose an optimization method that is specifically tailored to increase the FR model robustness against FAE.
We show that the proposed optimization can greatly increase the FR performance in real-world LQ evaluations such as IJB-S and TinyFace. Moreover, our framework achieves these improvements without sacrificing the performance on datasets with both HQ and LQ samples such as IJB-B and IJB-C.
We empirically show that the proposed method is a plug-and-play module, providing an orthogonal improvement to SOTA FR methods.

TinyFace Evaluations

<table> <tr style="background-color: #dee2e6;"> <th rowspan="1">Method</th> <th colspan="1">Training Set</th> <th colspan="1">Rank1</th> <th colspan="1">Rank5</th> </tr> <tr style="background-color: #ffffff;"> <td>URL</td> <td>MS1MV2</td> <td>63.89</td> <td>68.67</td> </tr> <tr style="background-color: #ffffff;"> <td>CurricularFace</td> <td>MS1MV2</td> <td>63.68</td> <td>67.65</td> </tr> <tr style="background-color: #ffffff;"> <td>ArcFace+CFSM★</td> <td>MS1MV2</td> <td>64.69</td> <td>68.80</td> </tr> <tr style="background-color: #ffffff;"> <td>ArcFace+ARoFace</td> <td>MS1MV2</td> <td>67.32</td> <td>72.45</td> </tr> <tr style="background-color: #dee2e6;"> <td>ArcFace</td> <td>MS1MV3</td> <td>63.81</td> <td>68.80</td> </tr> <tr style="background-color: #dee2e6;"> <td>ArcFace+ARoFace</td> <td>MS1MV3</td> <td>67.54</td> <td>71.05</td> </tr> <tr style="background-color: #ffffff;"> <td>AdaFace★</td> <td>WebFace4M</td> <td>72.02</td> <td>74.52</td> </tr> <tr style="background-color: #ffffff;"> <td>AdaFace+ARoFace</td> <td>WebFace4M</td> <td>73.98</td> <td>76.47</td> </tr> <tr style="background-color: #dee2e6;"> <td>AdaFace</td> <td>WebFace12M</td> <td>72.29</td> <td>74.97</td> </tr> <tr style="background-color: #dee2e6;"> <td>AdaFace+ARoFace</td> <td>WebFace4M</td> <td>74.00</td> <td>76.87</td> </tr> </table>

★ Re-runs with official code due to missing trained checkpoints on the specified dataset in the official repository

IJB-S Evaluations

<table> <tr style="background-color: #dee2e6;"> <th rowspan="2">Method</th> <th rowspan="2">Venue</th> <th rowspan="2">Dataset</th> <th colspan="3">Surveillance-to-Single</th> <th colspan="3">Surveillance-to-Booking</th> <th colspan="3">Surveillance-to-Surveillance</th> </tr> <tr style="background-color: #dee2e6;"> <td>Rank1</td> <td>Rank5</td> <td>1</td> <td>Rank1</td> <td>Rank5</td> <td>1</td> <td>Rank1</td> <td>Rank5</td> <td>1</td> </tr> <tr style="background-color: #ffffff;"> <td>ArcFace</td> <td>CVPR2019</td> <td>MS1MV2</td> <td>57.35</td> <td>64.42</td> <td>41.85</td> <td>57.36</td> <td>64.95</td> <td>41.23</td> <td>-</td> <td>-</td> <td>-</td> </tr> <tr style="background-color: #ffffff;"> <td>PFE</td> <td>ICCV2019</td> <td>MS1MV2</td> <td>50.16</td> <td>58.33</td> <td>31.88</td> <td>53.60</td> <td>61.75</td> <td>35.99</td> <td>9.20</td> <td>20.82</td> <td>0.84</td> </tr> <tr style="background-color: #ffffff;"> <td>URL</td> <td>ICCV2020</td> <td>MS1MV2</td> <td>59.79</td> <td>65.78</td> <td>41.06</td> <td>61.98</td> <td>67.12</td> <td>42.73</td> <td>-</td> <td>-</td> <td>-</td> </tr> <tr style="background-color: #ffffff;"> <td>ArcFace+ARoFace</td> <td>ECCV2024</td> <td>MS1MV2</td> <td>61.65</td> <td>67.6</td> <td>47.87</td> <td>60.66</td> <td>67.33</td> <td>46.34</td> <td>18.31</td> <td>32.07</td> <td>2.23</td> </tr> <tr style="background-color: #dee2e6;"> <td>ArcFace</td> <td>CVPR2019</td> <td>WebFace4M</td> <td>69.26</td> <td>74.31</td> <td>57.06</td> <td>70.31</td> <td>75.15</td> <td>56.89</td> <td>32.13</td> <td>46.67</td> <td>5.32</td> </tr> <tr style="background-color: #dee2e6;"> <td>ArcFace+ARoFace</td> <td>ECCV2024</td> <td>WebFace4M</td> <td>70.96</td> <td>75.54</td> <td>58.67</td> <td>71.70</td> <td>75.24</td> <td>58.06</td> <td>32.95</td> <td>50.30</td> <td>6.81</td> </tr> <tr style="background-color: #ffffff;"> <td>AdaFace</td> <td>CVPR2022</td> <td>WebFace12M</td> <td>71.35</td> <td>76.24</td> <td>59.40</td> <td>71.93</td> <td>76.56</td> <td>59.37</td> <td>36.71</td> <td>50.03</td> <td>4.62</td> </tr> <tr style="background-color: #ffffff;"> <td>AdaFace+ARoFace</td> <td>ECCV2024</td> <td>WebFace12M</td> <td>72.28</td> <td>77.93</td> <td>61.43</td> <td>73.01</td> <td>79.11</td> <td>60.02</td> <td>40.51</td> <td>50.90</td> <td>6.37</td> </tr> </table>

Usage

Training sets

Download and prepare datasets from InsightFace repository

Training

The total batch size we used for training was 2048 on four Nvidia RTX 6000 ADA. To have stable training, choose the learning rate based on the total batch size on your machine:

config.lr = (0.1*config.batch_size*config.ngpus)/(1024)

Please modify the

config.ngpus = 4

according to your resources in configs.

Then, for training on one machine using four GPUs:

torchrun --nproc_per_node=4 train_v2.py configs/ms1mv2_r100

Pretrained Models

Method	Arch	Dataset	Link
ArcFace+ARoFace	R100	MS1MV2	link
ArcFace+ARoFace	R100	MS1MV3	link
ArcFace+ARoFace	R100	WebFace4M	link
AdaFace+ARoFace	R100	WebFace4M	link
AdaFace+ARoFace	R100	WebFace12M	link

Citation

@misc{saadabadi2024arofacealignmentrobustnessimprove,
      title={ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition}, 
      author={Mohammad Saeed Ebrahimi Saadabadi and Sahar Rahimi Malakshan and Ali Dabouei and Nasser M. Nasrabadi},
      year={2024},
      eprint={2407.14972},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.14972}, 
}

Acknowledgments

Here are some great resources we benefit from:

ArcFace and AdaFace for the face recognition module.
advertorch, RobustAdversarialNetwork, and CFSM for the adversarial regularization.

Contact

If there is a question regarding any part of the code, or it needs further clarification, please create an issue or send me an email: me00018@mix.wvu.edu.