Home

Awesome

DropMAE

🌟 The codes for our CVPR 2023 paper 'DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks'. [Link]

If you find our work useful in your research, please consider citing:

@inproceedings{dropmae2023,
  title={DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks},
  author={Qiangqiang Wu and Tianyu Yang and Ziquan Liu and Baoyuan Wu and Ying Shan and Antoni B. Chan},
  booktitle={CVPR},
  year={2023}
}

Overall Architecture

<p align="left"> <img src="https://github.com/jimmy-dq/DropMAE/blob/main/figs_paper/pipeline.png" width="480"> </p>

Frame Reconstruction Results.

<p align="left"> <img src="https://github.com/jimmy-dq/DropMAE/blob/main/figs_paper/reconstruction_results.png" width="480"> </p>

Catalog

Environment setup

Dataset Download

DropMAE pre-training

To pre-train ViT-Base (the default configuration) with multi-node distributed training, run the following on 8 nodes with 8 GPUs each:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=8 \
--node_rank=$INDEX --master_addr=$CHIEF_IP --master_port=1234  main_pretrain_kinetics.py --batch_size 64 \
--model mae_vit_base_patch16 \
--norm_pix_loss \
--mask_ratio 0.75 \
--epochs 400 \
--warmup_epochs 40 \
--blr 1.5e-4 \
--weight_decay 0.05 \
--P 0.1 \
--frame_gap 50 \
--data_path $data_path_to_k400_training_videos \
--output_dir $output_dir \
--log_dir $log_dir

Training logs

The pre-training logs of K400-1600E and K700-800E are provided.

Pre-trained Models

<table><tbody> <!-- START TABLE --> <!-- TABLE HEADER --> <th valign="bottom"></th> <th valign="bottom">K400-1600E</th> <th valign="bottom">K700-800E</th> <!-- TABLE BODY --> <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://drive.google.com/file/d/1vB8YjPSPybImP1cJZmV2fknKaT8ha6JH/view?usp=share_link">download</a></td> <td align="center"><a href="https://drive.google.com/file/d/1qMuBJtNIQQ-NCz98Pig72YVKQdasc49h/view?usp=share_link">download</a></td> </tbody></table>

Fine-tuning on VOT

TrackerGOT-10K (AO)LaSOT (AUC)LaSOT (AUC)TrackingNet (AUC)TNL2K(AUC)
DropTrack-K700-ViTBase75.971.852.784.156.9

Fine-tuning on VOS