Home

Awesome

Introduction

BoundaryFace's motivation is partially inspired by NPT-Loss Arxiv. Due to the different research's line and the fact that BoundaryFace is not based on the innovation of NPT-Loss, we do not consider to cite NPT-Loss and compare it with BoundaryFace in this paper.

The differences with NPT-Loss are:

Now, for academic rigor, we have added related work to the original paper. See link at Arxiv

Quick Start

This repository will help you learn more about the details of our experiments in the paper.

Training environment

Experiment based on WebFace (in paper):

OSGPUPythonCUDAtorchtorchvision
Windows 101 TitanX3.7.09.01.1.00.3.0

packages: requirements_ct.txt

Experiment based on MS1M / MS1MV2:

OSGPUPythonCUDApytorchtorchvision
Linux8 * RTX 30703.7.011.31.10.00.11.0

packages: requirements_dt.txt

Dataset

For Training Set: InsightFace

For Testing Set: InsightFace:

Train

Please modify the relevant path parameters by yourself in advance. (such as the save path of closed-set noise etc.)

**For training CASIA-WebFace and noisy synthetic datasets, we use 1 NVIDIA TitanX GPU with batch size of 64. **

visdom
python ./training_mode/conventional_training/train.py

**For training MS1M / MS1MV2, we use 8 * 3070 GPU with batch size of 8 * 32 **

Datasetbackbonetotal epochmilestoneepoch_startms
MS1M / MS1MV2Res50-IR2410,18,22100.532
tensorboard --logdir ./training_mode/distributed_training/MS1M_tensorboard
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1324 ./training_mode/distributed_training/train_DDP.py

Results of retraining

The following problems existed with the experiments in the paper, so we decided to retrain the experiments in the paper:

Now we have retrained some results according to the experimental setting in the paper for your reference. We get more conclusions than the paper.

ratio: 0%:

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace99.2893.594.8993.1789.1397.5585.5792.8889.986.45
MV-Arc-Softmax99.2393.8394.6193.1789.3597.8885.893.1890.2386.35
CurricularFace99.3293.8594.9493.4789.5897.8786.3593.990.4587.57
BoundaryF199.2894.1394.6692.889.2797.8885.692.9589.7286.27
BoundaryFace(λ=π)99.3794.3294.9993.1589.2798.1285.9793.7589.7387.12

The following conclusions can be drawn from the above data:

ratio: 10%:

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace99.193.7894.4993.1589.0397.885.3393.2289.986.75
MV-Arc-Softmax99.3594.1894.2793.4289.2897.7386.0393.690.0586.93
CurricularFace99.1793.6393.6393.0788.6397.6585.0592.6889.786.37
BoundaryF199.393.6894.7993.2389.597.7885.8593.489.7886.88
BoundaryFace(λ=π)99.3394.1894.6493.389.297.8586.1893.5590.487.53

ratio: 20%:

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace99.19393.192.7587.6797.3585.0891.9289.0584.93
MV-Arc-Softmax99.0793.2393.3992.9888.1897.5585.4292.0389.2385.33
CurricularFace98.9791.6392.1192.0387.3596.3284.3590.787.8783.37
BoundaryF199.2293.8894.293.4888.697.985.9593.0289.587.05
BoundaryFace(λ=π)99.2593.993.9993.4888.1597.7886.3793.4789.6387.02

The following conclusions can be drawn from the above data:

About the parameter m

In this paper, we set the margin m = 0.5 for the training set containing closed-set noise ratio of 30% and open-set noise ratio of 10%, and we set the margin m = 0.3 for the other two mixing ratios. Now, after our further experiments, we found that CurricularFace can get better results by setting m=0.5 on the other two mixing ratios. Besides, BoundaryF1 can obtain better results on C 20 O 20 by setting m=0.5.

C: 20% O: 20% (m=0.3)

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace98.7590.1389.5690.8383.995.0781.588.4585.779.92
MV-Arc-Softmax98.7589.1389.290.6584.4595.1581.2388.885.4879.47
CurricularFace (m=0.5)98.3289.6389.9490.8784.4794.588188.2285.2878.83
BoundaryF198.9792.389.591.8283.7396.7584.290.9588.5583.87
BoundaryFace(λ=π)99.1392.6392.8992.4387.0397.0384.5591.6388.2784.77

Based on m = 0.5, we add the results of BoundaryF1:

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
BoundaryF1 (m=0.5)99.0392.79292.2885.8596.8784.0591.3287.8783.25

C: 10% O: 30% (m=0.3)

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace98.7890.7791.1491.685.696.283.1889.9587.2281.92
MV-Arc-Softmax99.0391.8391.7192.0286.0796.4583.6290.5387.5782.95
CurricularFace (m=0.5)98.9591.2291.8991.8886.1596.282.7389.7386.8781.5
BoundaryF199.0592.4391.2492.0585.696.5884.4391.1388.0283.67
BoundaryFace(λ=π)99.0892.8593.0492.287.3396.8284.8291.0788.784.43

Note: Due to lack of rigor, we made a mistake on the experiment here. The results of CurricularFace in the paper for the C 10 O 30 case correspond to a parameter of m=0.5 instead of m=0.3. We did not examine the parameter settings carefully at that time, and this reimplement of the paper results led us to discover the problem.

C: 30% O: 10% (m=0.5)

MethodLFWAgeDBCFP-FPCALFWCPLFWSLLFWAsianCaucasianIndiaAfrican
ArcFace98.5389.588.1190.8582.4794.7781.1787.8285.6879
MV-Arc-Softmax98.5389.5887.8690.8882.3795.0280.6288.185.1879.12
CurricularFace98.288.2588.5389.6882.8393.628086.4583.8277.08
BoundaryF199.0292.8591.392.6385.6896.7784.0890.9788.1783.95
BoundaryFace(λ=π)98.9892.9587.8692.7782.4896.883.5591.2287.984.03

As can be seen from the tables above, our approach still has significant advantages over SOTA. Taken together, although BoundaryFace (λ = π) still outperforms BoundaryF1 on these noisy datasets, it is not as outstanding as in the paper (In the C30 O 10 case, BoundaryFace does not show a clear advantage over BoundaryF1). We believe that the hyper-parameter λ may not be a good choice in this case, and we will conduct more experiments for further exploration subsequently.

Additional remarks

Note on open-set noise in paper:

Even though our method is for closed-set noise, experiments related to open-set noise are still performed.

In particular, we do not align open-set noise samples (Only resize to 112 * 112 when used).

The main reason is as follows:

In our initial experiments, we used open-set noise from distractors in the megaface dataset provided by insightface. However, during training, these aligned noisy samples may cause the training to crash. When we introduce only 20% of these open-set noise samples in the original WebFace, the accuracy of the model on the test set may occur as follows (Even if the m of ArcFace is reduced or softmax head is used):

image

This is incompatible with the situation caused by open-set noise in the real environment. The open set noise rate contained in MS1M is much higher than 20%, but the training process is still normal. The reasons for this problem, we think there may be two:

Such a bad situation does not occur if the unaligned samples are used as open-set noise sources, so this approach is used in the paper to simulate open-set noise in the real environment.

Supplemental Results

Due to resource and time constraints, the method is not tested on the real dataset MS1M in our paper, which leads to a lower confidence level of the method. We understand if the paper was ultimately rejected for this reason, but the reviewers and the AC ultimately accepted our paper.

In this subsection, we will supplement the experiments of our method on MS1M and provide all the data and files saved during the training process for your reference.

Visualization Results

Closed-set noise corrected by our method in WebFace

The list of closed-set noise corrected during training is here7dzp.

Example 1:

2022-08-24_214600

Example 2:

2022-08-24_214832

Example 3:

2022-08-24_215053

Example 4:

2022-08-24_215423

Closed-set noise corrected by our method in MS1M

MS1M is very noisy, and whether a sample is closed-set noise or not is entirely discerned by our method itself.

Example 1:

2022-08-24_221314

Example 2:

2022-08-24_221540

Example 3:

2022-08-24_221759

Closed-set noise corrected by our method in MS1MV2

Even if MS1MV2 is considered as a clean dataset, our method still finds a small amount of closed-set noise.

Example 1:

2022-08-24_222514

Example 2:

2022-08-24_223001

Example 3:

2022-08-24_223359

Test Results

Note:

Training Set: MS1M

IJB-C: 1:1 TAR @FAR=1e-4

MethodMegaFace(R)@Rank1IJB-CLFWAgeDB-30CFP-FPCALFWCPLFWSLLFWAsianCaucasianIndianAfrican
ArcFace96.4591.6599.797.5596.4395.9291.5399.395.1398.496.7596.02
MV-Arc-Softmax96.8691.4799.7297.7296.495.8592.0299.2895.1598.4296.7396.33
CurricularFace95.8290.7899.797.4896.1795.6591.8299.193.7797.9795.7395.52
BoundaryFace (λ=0)97.5791.7499.697.7796.3495.9591.9899.2595.0398.3796.8596.15
BoundaryFace (λ=π)97.5330.1499.6897.8294.5395.9287.299.3395.1798.3796.496.43

As can be seen from the table above, BoundaryFace(λ=π) performs very poorly on IJB-C even though it outperforms the baseline on MegaFace. We believe that the possible reasons are, on the one hand, that the hyper-parameter π is not applicable to BoundaryFace in real large-scale noisy datasets, and on the other hand, it may be caused by the excessive attention to open-set noise in the method itself. We will explore this problem in the future.

Training Set: MS1MV2

MethodMegaFace(R)@Rank1IJB-CLFWAgeDB-30CFP-FPCALFWCPLFWSLLFWAsianCaucasianIndianAfrican
ArcFace97.3194.5199.7298.0395.8995.9291.799.429799.197.5297.95
BoundaryFace (λ=π)97.4194.4999.7597.9895.6395.991.8399.5297.3899.2297.6397.93

The following are the test results of the corresponding models on our private datasets. The datasets are collected in real environment.

MethodS2V-s@Rank1S2V-v@Rank1Entry@Rank1HD@Rank1swjtu2D_SEN-s@Rank1swjtu2D_SEN-v@Rank1
ArcFace93.1672.4395.8799.3992.8948.52
BoundaryFace (λ=π)95.4467.3596.799.3993.9151.9

Some conclusions

Todo

Acknowledgements