Awesome

On Mitigating Hard Clusters for Face Clustering

Dependency

python>=3.6
pytorch>=1.6.0
torchvision>=0.8.1

conda install faiss-gpu -c pytorch
pip install -r requirements.txt

Usage

Dataset Preparation

Here we use MS1M dataset as an example.

Data format

The data directory is constucted as follows:

.
├── data
|   ├── features
|   |   └── xxx.bin
│   ├── labels
|   |   └── xxx.meta
│   ├── knns
|   |   └── ...

features currently supports binary file.
labels supports plain text where each line indicates a label corresponding to the feature file.
knns can also be computed with is_reload in configuration files set to True.

Take MS1M (Part0 and Part1) as an example. The data directory is as follows:

data
  ├── features
    ├── part0_train.bin                 # acbbc780948e7bfaaee093ef9fce2ccb
    ├── part1_test.bin                  # ced42d80046d75ead82ae5c2cdfba621
  ├── labels
    ├── part0_train.meta                # class_num=8573, inst_num=576494
    ├── part1_test.meta                 # class_num=8573, inst_num=584013
  ├── knns
    ├── part0_train/faiss_k_80.npz      # 5e4f6c06daf8d29c9b940a851f28a925
    ├── part1_test/faiss_k_80.npz       # d4a7f95b09f80b0167d893f2ca0f5be5

Downloads

MS1M
- part0_train & part1_test (584K): GoogleDrive.
- part0_train & part1/3/5/7/9_test: GoogleDrive.
- Precomputed KNN: GoogleDrive.

Configuration

Configuration files are provided in ./config.

config_train_ms1m.yaml for training our similarity prediction model on the training set, i.e., "part0_train".
config_eval_ms1m_part*.yaml for evaluation on the 5 test subsets, i.e., "part1_test", "part3_test", "part5_test", "part7_test", "part9_test".

Training

After setting the configuration, to start training, simply run

python main.py -c ./config/config_train_ms1m.yaml

Folder for saving checkpoints is specified in the configuration file using parameter work_dir.

We provide a pre-trained model checkpoint.tar in ./save/Ours.

Test

Once the training is completed, the obtained model can be used for clustering. To start clustering on the test subset "part*_test", simply run

python eval.py -c ./config/config_eval_ms1m_part*.yaml

The clustering results will be saved in work_dir/results.

Citation

If you use this repo in your research or wish to refer to the baseline results published in this paper, please use the following BibTeX entry.

@inproceedings{yingjie2022,
  title={On Mitigating Hard Clusters for Face Clustering},
  author={Chen, Yingjie and Zhong, Huasong and Chen, Chong and Shen, Chen and Huang, Jianqiang and Wang, Tao and Liang, Yun and Sun, Qianru},
  booktitle={IEEE ECCV},
  year={2022}
}