Home

Awesome

Seeking the Shape of Sound

An implement of the CVPR 2021 paper: Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association image

Environments

<!-- This code is implemented with Pytorch (tested on 1.4.0). -->

See requirement.txt.

Data preparation

Download VoxCeleb, VGGFace and unzip them to ./data.

Limited by file size, only part of the query lists is included in ./data. Other lists used in the article can be downloaded from Google drive or Baidu drive (passwd: rfri).

Training

<!-- The training process consists of three steps: --> <!-- 1. Train the model and update identity weights: -->
  1. Download pretrained models for backbones into ./pretrained_models.

Google drive:

SE-ResNet-50

Thin-ResNet-34

Baidu drive:

SE-ResNet-50 (passwd: jy55)

Thin-ResNet-34 (passwd: tc6i)

  1. Train the model and update identity weights:
python3 train.py config/train_reweight.yaml
  1. Extract identity weights from saved model file:
python3 extract_id_weight.py config/train_reweight.yaml

The 4. Retrain the final model:

python3 train.py config/train_main.yaml

The model and log are saved in save/vox1_train/Voice2Face/main by default.

Evaluation

  1. Download the pretrained model from Google drive or Baidu drive (passwd: 4vyf).
  2. Modify configures in config/train_main.yaml: change resume\_eval to the path where the model is saved.
  3. Run
python3 eval.py config/train_main.yaml

Expected results (%):

1:2 Matching (U)1:2 Matching (G)Verification (U)Verification (G)Retrieval
Voice-to-Face87.277.787.277.55.5
Face-to-Voice86.575.387.076.15.8

The results might slightly differ from the above due to random factors in the training process.

References

If this code is helpful to you, please consider citing our paper:

@inproceedings{wen2021seeking,
  title={Seeking the shape of sound: An adaptive framework for learning voice-face association},
  author={Wen, Peisong and Xu, Qianqian and Jiang, Yangbangyan and Yang, Zhiyong and He, Yuan and Huang, Qingming},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}