Home

Awesome

A Simple and Effective Baseline

Weakly-supervised crowd counting with token attention and fusion: A Simple and Effective Baseline (ICASSP 2024)

Overview

avatar

Comparison between four backbone networks on Part_A of the ShanghaiTech dataset

BackboneMAEMSE
EfficientNet-B776.4115.0
ViT-B-38472.6123.4
Swin_B-38467.0108.5
Mamba71.7122.8

Backbone of mamba

Environment

python >=3.6 
pytorch >=1.5
opencv-python >=4.0
scipy >=1.4.0
h5py >=2.10
pillow >=7.0.0
imageio >=1.18
timm==0.1.30

Datasets

Prepare data

cd data
run  python predataset_xx.py

“xx” means the dataset name, including sh, jhu, qnrf, and nwpu. You should change the dataset path.

Generate image file list:

run python make_npydata.py

Training

Training example:

python train.py --dataset ShanghaiA  --save_path ./save_file/ShanghaiA --batch_size 24 --model_type 'token' 
python train.py --dataset ShanghaiA  --save_path ./save_file/ShanghaiA batch_size 24 --model_type 'gap'
python train.py --dataset ShanghaiA  --save_path ./save_file/ShanghaiA batch_size 24 --model_type 'swin'
python train.py --dataset ShanghaiA  --save_path ./save_file/ShanghaiA batch_size 24 --model_type 'mamba'

Please utilize a single GPU with 24G memory or multiple GPU for training. On the other hand, you also can change the batch size.

Testing

Test example:

Download the pretrained model from Baidu-Disk, passward:8a8n

python test.py --dataset ShanghaiA  --pre model_best.pth --model_type 'gap'
...

Reference

If you find this project is useful for your research, please cite:

@inproceedings{wang2024weakly,
  title={Weakly-Supervised Crowd Counting with Token Attention and Fusion: A Simple and Effective Baseline},
  author={Wang, Yi and Hu, Qiongyang and Chau, Lap-Pui},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={13456--13460},
  year={2024},
  organization={IEEE}
}