Home

Awesome

<div align="center"> <h3>Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling</h3>

Xin Ma<sup>1</sup>, Chang Liu<sup>2</sup>, Chunyu Xie<sup>3</sup>, Long Ye<sup>1</sup>, Yafeng Deng<sup>3</sup>, Xiangyang Ji<sup>2</sup>,

<sup>1</sup> Communication University of China, <sup>2</sup> Tsinghua University, <sup>3</sup> 360 AI Research.

</div>

This repo is the official implementation of Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling. It currently concludes codes and Pre-trained checkpoints.

<p align="center"> <img src="https://user-images.githubusercontent.com/94091472/210162854-da4afe07-4304-4e43-af55-45092270b479.png" width="1500"> </p>

Introduction

This work aims to alleviate the training inefficiency in masked image modeling (MIM). We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, DMJD imposes a masking regulation to generate multiple complementary views facilitating more invisible tokens of each image to be reconstructed in the invisible reconstruction branch and further devise a dual-branch joint distillation architecture with an additional visible distillation branch to take full use of the input signals with superior targets. Extensive experiments and visualizations prove that with increased prediction rates, visible distillation, and superior targets can accelerate the training convergence yet not sacrificing the model generalization ability.

The contributions are summarized as follows:

Getting Started

Setup

Step 1. We provide a Dockerfile to build an image. Ensure that your docker version>=19.03.

# build an image with PyTorch 1.11, CUDA 11.3, and mmsegmentation
# If you prefer other versions, just modified the Dockerfile
docker build -t env:dmjd .

Step 2. Run it with

docker run --gpus all --shm-size=8g -itd -v {DATA_DIR}:/path/to/data -v {CODE_DIR}:/path/to/dmjd env:dmjd

Pre-training

The pre-training instruction is in PRETRAIN.md.

Evaluation

<table><tbody> <!-- START TABLE --> <!-- TABLE HEADER --> <th valign="bottom"></th> <th valign="bottom">ConViT-Base</th> <th valign="bottom">ConViT-Large</th> <!-- TABLE BODY --> <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://drive.google.com/file/d/13kVEGFlZcRdSdt8-4hQ1dwOezSx0DLVl/view?usp=share_link">download</a></td> <td align="center"><a href="https://drive.google.com/file/d/1HdxvfWy8NlfhOJVBC5fLHOtD0_YhbI-j/view?usp=share_link">download</a></td> </tr> </tbody></table>
MethodBackboneETEGh.Learning TargetFT acc@1(%)LIN acc@1(%)
MaskFeatViT-B1600240HOG84.068.5
+DMJDViT-B1600132 (1.8×)HOG84.1 (+0.1)71.9 (+3.4)
ConvMAEConViT-B1600300RGB85.070.9
+DMJDConViT-B800101 (3×)HOG85.2 (+0.2)76.7 (+5.8)
ConvMAEConViT-L800480RGB86.2-
+DMJDConViT-L800267 (1.8×)HOG86.3 (+0.1)79.7

Acknowledgement

This repo is built on top of DeiT, MAE and ConvMAE. The semantic segmentation parts is based on MMSegmentation. Thanks for their wonderful work.

License

DMJD is released under the MIT License.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@Article{ma2022disjoint,
      title   = {Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling}, 
      author  = {Xin Ma and Chang Liu and Chunyu Xie and Long Ye and Yafeng Deng and Xiangyang Ji},
      journal = {arXiv:2301.00230},
      year    = {2022},
}