This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions identity on a single server.
In order to enjoy the new features of pytorch, we have upgraded the pytorch to 1.9.0.
Pytorch before 1.9.0 may not work in the future.
- Install PyTorch (torch>=1.9.0), our doc for install.md.
- (Optional) Install DALI, our doc for install_dali.md.
pip install -r requirement.txt
.
To train a model, run train.py
with the path to the configs.
The example commands below show how to run
distributed training.
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12581 train.py configs/ms1mv3_r50_lr02
Node 0:
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus
Node 1:
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12345 train_v2.py configs/wf42m_pfc03_40epoch_8gpu_vit_b.py
- MS1MV2 (87k IDs, 5.8M images)
- MS1MV3 (93k IDs, 5.2M images)
- Glint360K (360k IDs, 17.1M images)
- WebFace42M (2M IDs, 42.5M images)
- The models are available for non-commercial research purposes only.
- All models can be found in here.
- Baidu Yun Pan: e8pw
- OneDrive
Performance on IJB-C and ICCV2021-MFR
ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities. As the result, we can evaluate the FAIR performance for different algorithms.
For ICCV2021-MFR-ALL set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The globalised multi-racial testset contains 242,143 identities and 1,624,305 images.
Datasets | Backbone | MFR-ALL | IJB-C(1E-4) | IJB-C(1E-5) | log |
---|---|---|---|---|---|
MS1MV2 | mobilefacenet-0.45G | 62.07 | 93.61 | 90.28 | click me |
MS1MV2 | r50 | 75.13 | 95.97 | 94.07 | click me |
MS1MV2 | r100 | 78.12 | 96.37 | 94.27 | click me |
MS1MV3 | mobilefacenet-0.45G | 63.78 | 94.23 | 91.33 | click me |
MS1MV3 | r50 | 79.14 | 96.37 | 94.47 | click me |
MS1MV3 | r100 | 81.97 | 96.85 | 95.02 | click me |
Glint360K | mobilefacenet-0.45G | 70.18 | 95.04 | 92.62 | click me |
Glint360K | r50 | 86.34 | 97.16 | 95.81 | click me |
Glint360k | r100 | 89.52 | 97.55 | 96.38 | click me |
WF4M | r100 | 89.87 | 97.19 | 95.48 | click me |
WF12M-PFC-0.2 | r100 | 94.75 | 97.60 | 95.90 | click me |
WF12M-PFC-0.3 | r100 | 94.71 | 97.64 | 96.01 | click me |
WF12M | r100 | 94.69 | 97.59 | 95.97 | click me |
WF42M-PFC-0.2 | r100 | 96.27 | 97.70 | 96.31 | click me |
WF42M-PFC-0.2 | ViT-T-1.5G | 92.04 | 97.27 | 95.68 | click me |
WF42M-PFC-0.3 | ViT-B-11G | 97.16 | 97.91 | 97.05 | click me |
Datasets | Backbone(bs*gpus) | MFR-ALL | IJB-C(1E-4) | IJB-C(1E-5) | Throughout | log |
---|---|---|---|---|---|---|
WF42M-PFC-0.2 | r50(512*8) | 93.83 | 97.53 | 96.16 | ~5900 | click me |
WF42M-PFC-0.2 | r50(512*16) | 93.96 | 97.46 | 96.12 | ~11000 | click me |
WF42M-PFC-0.2 | r50(128*32) | 94.04 | 97.48 | 95.94 | ~17000 | click me |
WF42M-PFC-0.2 | r100(128*16) | 96.28 | 97.80 | 96.57 | ~5200 | click me |
WF42M-PFC-0.2 | r100(256*16) | 96.69 | 97.85 | 96.63 | ~5200 | click me |
WF42M-PFC-0.0018 | r100(512*32) | 93.08 | 97.51 | 95.88 | ~10000 | click me |
WF42M-PFC-0.2 | r100(128*32) | 96.57 | 97.83 | 96.50 | ~9800 | click me |
r100(128*32)
means backbone is r100, batchsize per gpu is 128, the number of gpus is 32.
Datasets | Backbone(bs) | FLOPs | MFR-ALL | IJB-C(1E-4) | IJB-C(1E-5) | Throughout | log |
---|---|---|---|---|---|---|---|
WF42M-PFC-0.3 | r18(128*32) | 2.6 | 79.13 | 95.77 | 93.36 | - | click me |
WF42M-PFC-0.3 | r50(128*32) | 6.3 | 94.03 | 97.48 | 95.94 | - | click me |
WF42M-PFC-0.3 | r100(128*32) | 12.1 | 96.69 | 97.82 | 96.45 | - | click me |
WF42M-PFC-0.3 | r200(128*32) | 23.5 | 97.70 | 97.97 | 96.93 | - | click me |
WF42M-PFC-0.3 | VIT-T(384*64) | 1.5 | 92.24 | 97.31 | 95.97 | ~35000 | click me |
WF42M-PFC-0.3 | VIT-S(384*64) | 5.7 | 95.87 | 97.73 | 96.57 | ~25000 | click me |
WF42M-PFC-0.3 | VIT-B(384*64) | 11.4 | 97.42 | 97.90 | 97.04 | ~13800 | click me |
WF42M-PFC-0.3 | VIT-L(384*64) | 25.3 | 97.85 | 98.00 | 97.23 | ~9406 | click me |
WF42M
means WebFace42M, PFC-0.3
means negivate class centers sample rate is 0.3.
Datasets | Backbone | MFR-ALL | IJB-C(1E-4) | IJB-C(1E-5) | log |
---|---|---|---|---|---|
WF12M-Flip(40%) | r50 | 43.87 | 88.35 | 80.78 | click me |
WF12M-Flip(40%)-PFC-0.1* | r50 | 80.20 | 96.11 | 93.79 | click me |
WF12M-Conflict | r50 | 79.93 | 95.30 | 91.56 | click me |
WF12M-Conflict-PFC-0.3* | r50 | 91.68 | 97.28 | 95.75 | click me |
WF12M
means WebFace12M, +PFC-0.1*
denotes additional abnormal inter-class filtering.
Arcface-Torch can train large-scale face recognition training set efficiently and quickly. When the number of classes in training sets is greater than 1 Million, partial fc sampling strategy will get same accuracy with several times faster training performance and smaller GPU memory. Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC, we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed training and mixed precision training.
More details see speed_benchmark.md in docs.
- Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)
-
means training failed because of gpu memory limitations.
Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
---|---|---|---|
125000 | 4681 | 4824 | 5004 |
1400000 | 1672 | 3043 | 4738 |
5500000 | - | 1389 | 3975 |
8000000 | - | - | 3565 |
16000000 | - | - | 2679 |
29000000 | - | - | 1855 |
- GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)
Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
---|---|---|---|
125000 | 7358 | 5306 | 4868 |
1400000 | 32252 | 11178 | 6056 |
5500000 | - | 32188 | 9854 |
8000000 | - | - | 12310 |
16000000 | - | - | 19950 |
29000000 | - | - | 32324 |
@inproceedings{deng2019arcface,
title={Arcface: Additive angular margin loss for deep face recognition},
author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={4690--4699},
year={2019}
}
@inproceedings{An_2022_CVPR,
author={An, Xiang and Deng, Jiankang and Guo, Jia and Feng, Ziyong and Zhu, XuHan and Yang, Jing and Liu, Tongliang},
title={Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2022},
pages={4042-4051}
}
@inproceedings{zhu2021webface260m,
title={Webface260m: A benchmark unveiling the power of million-scale deep face recognition},
author={Zhu, Zheng and Huang, Guan and Deng, Jiankang and Ye, Yun and Huang, Junjie and Chen, Xinze and Zhu, Jiagang and Yang, Tian and Lu, Jiwen and Du, Dalong and Zhou, Jie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10492--10502},
year={2021}
}