Home

Awesome

<div align="center"> <h1>vHeat</h1> <h3>vHeat: Building Vision Models upon Heat Conduction</h3>

ZhaoZhi Wang<sup>1,2*</sup>, Yue Liu<sup>1*</sup>, Yunfan Liu<sup>1</sup>, Hongtian Yu<sup>1</sup>,

Yaowei Wang<sup>2,3</sup>, Qixiang Ye<sup>1,2</sup>, Yunjie Tian<sup>1</sup>

<sup>1</sup> University of Chinese Academy of Sciences, <sup>2</sup> Peng Cheng Laboratory,

<sup>3</sup> Harbin Institute of Technology (Shenzhen)

<sup>*</sup> Equal contribution.

Paper: (2405.16555)

</div>

Abstract

A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image. In this study, we propose vHeat, a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field. The essential idea, inspired by the physical principle of heat conduction, is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy. This mechanism is incorporated into deep models through the newly proposed module, the Heat Conduction Operator (HCO), which is physically plausible and can be efficiently implemented using DCT and IDCT operations with a complexity of O(N<sup>1.5</sup>). Extensive experiments demonstrate that vHeat surpasses Vision Transformers (ViTs) across various vision tasks, while also providing higher inference speeds, reduced FLOPs, and lower GPU memory usage for high-resolution images.

Main Results

:book: Checkpoint and log files will be released soon

Classification on ImageNet-1K with vHeat

namepretrainresolutionacc@1#paramsFLOPsThroughputconfigs/logs/ckpts
Swin-TImageNet-1K224x22481.229M4.5G1244
Swin-SImageNet-1K224x22483.050M8.7G728
Swin-BImageNet-1K224x22483.589M15.4G458
vHeat-TImageNet-1K224x22482.229M4.6G1514config/log/ckpt
vHeat-SImageNet-1K224x22483.650M8.5G945config/log/ckpt
vHeat-BImageNet-1K224x22483.987M14.9G661config/log/ckpt

Object Detection on COCO with vHeat

Backbone#paramsFLOPsDetectorbox mAPmask mAPconfigs/logs/ckpts
Swin-T48M267GMaskRCNN@1x42.739.3--
vHeat-T53M286GMaskRCNN@1x45.141.2config/log/ckpt
Swin-S69M354GMaskRCNN@1x44.840.9--
vHeat-S74M377GMaskRCNN@1x46.842.3config/log/ckpt
Swin-B107M496GMaskRCNN@1x46.942.3--
vHeat-B115M526GMaskRCNN@1x47.743.0config/log/ckpt
Swin-T48M267GMaskRCNN@3x46.041.6--
vHeat-T53M286GMaskRCNN@3x47.342.5config/log/ckpt
Swin-S69M354GMaskRCNN@3x48.243.2--
vHeat-S74M377GMaskRCNN@3x48.843.7config/log/ckpt

Semantic Segmentation on ADE20K with vHeat

BackboneInput#paramsFLOPsSegmentormIoU(SS)configs/logs/ckpts
Swin-T512x51260M945GUperNet@160k44.4--
vHeat-T512x51262M948GUperNet@160k47.0config/log/ckpt
Swin-S512x51281M1039GUperNet@160k47.6--
vHeat-S512x51282M1028GUperNet@160k49.0config/log/ckpt
Swin-B512x512121M1188GUperNet@160k48.1--
vHeat-B512x512129M1219GUperNet@160k49.6config/log/ckpt

Getting Started

Installation

Step 1: Clone the vHeat repository:

To get started, first clone the vHaet repository and navigate to the project directory:

git clone https://github.com/MzeroMiko/vHeat.git
cd vHeat

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n vHeat
conda activate vHeat

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Model Training and Inference

Classification

To train vHeat models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=16 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/to/dataset> --output /tmp

If you only want to test the performance (together with params and FLOPs):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/to/dataset> --output /tmp --resume </path/to/checkpoint> --eval --model_ema False

please refer to modelcard for more details.

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

For more information about detection and segmentation tasks, please refer to the manual of mmdetection and mmsegmentation. Remember to use the appropriate backbone configurations in the configs directory.

Before training on downstream tasks (detection/segmentation), please run interpolate4downstream.py to modify the classification pre-trained checkpoint to load for training.

Citation

@misc{wang2024vheat,
      title={vHeat: Building Vision Models upon Heat Conduction}, 
      author={Zhaozhi Wang and Yue Liu and Yunfan Liu and Hongtian Yu and Yaowei Wang and Qixiang Ye and Yunjie Tian},
      year={2024},
      eprint={2405.16555},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}