Awesome

awesome-MIM

We have listed the most popular methods in the field of Masked Image Modeling (MIM). If there are any omissions, please feel free to submit a request for additions. (Note: The dates shown correspond to the first submission of the papers to arXiv, but the provided links may point to the latest versions.)

Additionally, we encourage you to cite our work, SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders.

Backbone models.

Date	Method	Conference	Title	Code
2020-xx-xx(maybe 2019)	iGPT	ICML 2020	Generative Pretraining from Pixels	iGPT
2020-10-22	ViT	ICLR 2021 (Oral)	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	ViT
2021-04-08	SiT	Arxiv 2021	SiT: Self-supervised vIsion Transformer	None
2021-06-10	MST	NeurIPS 2021	MST: Masked Self-Supervised Transformer for Visual Representation	None
2021-06-14	BEiT	ICLR 2022 (Oral)	BEiT: BERT Pre-Training of Image Transformers	BEiT
2021-11-11	MAE	Arxiv 2021	Masked Autoencoders Are Scalable Vision Learners	MAE
2021-11-15	iBoT	ICLR 2022	iBOT: Image BERT Pre-Training with Online Tokenizer	iBoT
2021-11-18	SimMIM	Arxiv 2021	SimMIM: A Simple Framework for Masked Image Modeling	SimMIM
2021-11-24	PeCo	Arxiv 2021	PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers	None
2021-11-30	MC-SSL0.0	Arxiv 2021	MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning	None
2021-12-16	MaskFeat	Arxiv 2021	Masked Feature Prediction for Self-Supervised Visual Pre-Training	None
2021-12-20	SplitMask	Arxiv 2021	Are Large-scale Datasets Necessary for Self-Supervised Pre-training?	None
2022-01-31	ADIOS	Arxiv 2022	Adversarial Masking for Self-Supervised Learning	None
2022-02-07	CAE	Arxiv 2022	Context Autoencoder for Self-Supervised Representation Learning	CAE
2022-02-07	CIM	Arxiv 2022	Corrupted Image Modeling for Self-Supervised Visual Pre-Training	None
2022-03-10	MVP	Arxiv 2022	MVP: Multimodality-guided Visual Pre-training	None
2022-03-23	AttMask	ECCV 2022	What to Hide from Your Students: Attention-Guided Masked Image Modeling	AttMask
2022-03-29	mc-BEiT	Arxiv 2022	mc-BEiT: Multi-choice Discretization for Image BERT Pre-training	None
2022-04-18	Ge2-AE	Arxiv 2022	The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training	None
2022-05-08	MCMAE	NeurIPS 2022	MCMAE: Masked Convolution Meets Masked Autoencoders	MCMAE
2022-05-20	UM-MAE	Arxiv 2022	Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality	UM-MAE
2022-05-26	GreenMIM	Arxiv 2022	Green Hierarchical Vision Transformer for Masked Image Modeling	GreenMIM
2022-05-26	MixMIM	Arxiv 2022	MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning	Code is Opening
2022-05-28	SupMAE	Arxiv 2022	SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners	SupMAE
2022-05-30	HiViT	Arxiv 2022	HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling	None
2022-06-01	LoMaR	Arxiv 2022	Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction	LoMaR
2022-06-22	SemMAE	NeurIPS 2022	SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders	SemMAE
2022-08-11	MILAN	Arxiv 2022	MILAN: Masked Image Pretraining on Language Assisted Representation	MILAN
2022-11-14	EVA	Arxiv 2022	EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	EVA
2022-11-28	AMT	AAAI 2023	Good helper is around you: Attention-driven Masked Image Modeling	AMT
2023-01-03	TinyMIM	CVPR 2023	TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models	TinyMIM
2023-03-04	PixMIM	Arxiv 2023	PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling	PixMIM
2023-03-09	LocalMIM	CVPR 2023	Masked Image Modeling with Local Multi-Scale Reconstruction	LocalMIM
2023-03-12	AutoMAE	Arxiv 2023	Improving Masked Autoencoders by Learning Where to Mask	AutoMAE
2023-03-15	DeepMIM	Arxiv 2023	DeepMIM: Deep Supervision for Masked Image Modeling	DeepMIM
2023-04-25	Img2Vec	Arxiv 2023	Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders	None
2023-12-30	DTM	Arxiv 2023	Masked Image Modeling via Dynamic Token Morphing	None
2024-11-24	PR-MIM	Arxiv 2024	PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling	None

Others:

Object detection.

Date	Method	Conference	Title	Code
2022-04-06	MIMDet	Arxiv 2022	Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection	MIMDet

3D.

Date	Method	Conference	Title	Code
2021-11-29	Point-BERT	CVPR 2022	Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling	Point-BERT
2022-03-28	Point-MAE	ECCV 2022	Masked Autoencoders for Point Cloud Self-supervised Learning	Point-MAE
2022-05-28	Point-M2AE	NeurIPS 2022	Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training	Point-M2AE
2022-12-13	I2P-MAE	CVPR 2023	Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders	I2P-MAE
2024-04-01	NeRF-MAE	ECCV 2024	NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields	NeRF-MAE

Image generation.

Date	Method	Conference	Title	Code
2022-02-08	MaskGIT	Arxiv 2022	MaskGIT: Masked Generative Image Transformer	None

Unsupervised Domain Adaptation.

Date	Method	Conference	Title	Code
2023-06-18	MIC	CVPR 2023	MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation	None

Video.

Date	Method	Conference	Title	Code
2021-12-02	BEVT	Arxiv 2021	BEVT: BERT Pretraining of Video Transformers	BEVT
2022-03-23	VideoMAE	NeurIPS 2022	VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	VideoMAE
2022-05-18	MAE_ST	NeurIPS 2022	Masked Autoencoders As Spatiotemporal Learners	MAE_ST
2023-03-29	VideoMAE v2	CVPR 2023	VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking	None

Multi-modal.

Date	Method	Conference	Title	Code
2022-04-04	MultiMAE	Arxiv 2022	MultiMAE: Multi-modal Multi-task Masked Autoencoders	MultiMAE
2022-05-27	M3AE	Arxiv 2022	Multimodal Masked Autoencoders Learn Transferable Representations	None
2022-08-03	xxx	Arxiv 2022	Masked Vision and Language Modeling for Multi-modal Representation Learning	None
2022-12-01	FLIP	Arxiv 2022	Scaling Language-Image Pre-training via Masking	None

Medical.

Date	Method	Conference	Title	Code
2022-03-10	MedMAE	Arxiv 2022	Self Pre-training with Masked Autoencoders for Medical Image Analysis	None

Analysis.

Date	Method	Conference	Title
2022-08-08	RelaxMIM	Arxiv 2022	Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Survey.

Date	Conference	Title
2022-07-30	Arxiv 2022	A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
2023-12-31	Arxiv 2023	Masked Modeling for Self-supervised Representation Learning on Vision and Beyond