Awesome

TCFormer (CVPR'2022 Oral, TPAMI'2024)

[CVPR'2022 paper] [TPAMI'2024 paper]

Introduction

Official code repository for the papers:
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, and Xiaogang Wang]

and

TCFormer: Visual Recognition via Token Clustering Transformer
[Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang]

teaser

TODO

Whole-body pose estimation training/testing codes release.
Whole-body pose estimation model zoo release.
TCFormer-large on COCO-WholeBody dataset.
Flops calculation function.
Integrate TCFormer to MMPose.

Model Zoo

You can find the pretrained checkpoints here.

Image Classification

Classification configs & weights see >>>here<<<.

TCFormer on ImageNet-1K

Method	Size	Acc@1	#Params (M)	Config	Checkpoint	log
TCFormer-light	224	79.4	14.2M	config	57M [Google]	[Google]
TCFormer	224	82.3	25.6M	config	103M [Google]	[Google]
TCFormer-large	224	83.6	62.8M	config	103M [Google]	[Google]

WholeBody Estimation

WholeBody Estimation configs & weights see >>>here<<<.

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
TCFormer	256x192	0.697	0.774	0.705	0.821	0.656	0.753	0.539	0.652	0.576	0.681	ckpt	log
TCFormer_large	384x288	0.718	0.794	0.744	0.850	0.790	0.856	0.614	0.715	0.642	0.733	ckpt	log

Citation

If you find this project useful in your research, please cite:

@inproceedings{zeng2022not,
  title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11101--11111},
  year={2022}
}

@article{zeng2024tcformer,
  title={TCFormer: Visual Recognition via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Xu, Lumin and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping and Wang, Xiaogang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

Acknowledgement

Thanks to:

PVT
MMPose

License

This project is released under the Apache 2.0 license.