Home

Awesome

awesome-MIM

We have listed the most popular methods in the field of Masked Image Modeling (MIM). If there are any omissions, please feel free to submit a request for additions. (Note: The dates shown correspond to the first submission of the papers to arXiv, but the provided links may point to the latest versions.)

Additionally, we encourage you to cite our work, SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders.

Backbone models.

DateMethodConferenceTitleCode
2020-xx-xx(maybe 2019)iGPTICML 2020Generative Pretraining from PixelsiGPT
2020-10-22ViTICLR 2021 (Oral)An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleViT
2021-04-08SiTArxiv 2021SiT: Self-supervised vIsion TransformerNone
2021-06-10MSTNeurIPS 2021MST: Masked Self-Supervised Transformer for Visual RepresentationNone
2021-06-14BEiTICLR 2022 (Oral)BEiT: BERT Pre-Training of Image TransformersBEiT
2021-11-11MAEArxiv 2021Masked Autoencoders Are Scalable Vision LearnersMAE
2021-11-15iBoTICLR 2022iBOT: Image BERT Pre-Training with Online TokenizeriBoT
2021-11-18SimMIMArxiv 2021SimMIM: A Simple Framework for Masked Image ModelingSimMIM
2021-11-24PeCoArxiv 2021PeCo: Perceptual Codebook for BERT Pre-training of Vision TransformersNone
2021-11-30MC-SSL0.0Arxiv 2021MC-SSL0.0: Towards Multi-Concept Self-Supervised LearningNone
2021-12-16MaskFeatArxiv 2021Masked Feature Prediction for Self-Supervised Visual Pre-TrainingNone
2021-12-20SplitMaskArxiv 2021Are Large-scale Datasets Necessary for Self-Supervised Pre-training?None
2022-01-31ADIOSArxiv 2022Adversarial Masking for Self-Supervised LearningNone
2022-02-07CAEArxiv 2022Context Autoencoder for Self-Supervised Representation LearningCAE
2022-02-07CIMArxiv 2022Corrupted Image Modeling for Self-Supervised Visual Pre-TrainingNone
2022-03-10MVPArxiv 2022MVP: Multimodality-guided Visual Pre-trainingNone
2022-03-23AttMaskECCV 2022What to Hide from Your Students: Attention-Guided Masked Image ModelingAttMask
2022-03-29mc-BEiTArxiv 2022mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingNone
2022-04-18Ge2-AEArxiv 2022The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-TrainingNone
2022-05-08MCMAENeurIPS 2022MCMAE: Masked Convolution Meets Masked AutoencodersMCMAE
2022-05-20UM-MAEArxiv 2022Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with LocalityUM-MAE
2022-05-26GreenMIMArxiv 2022Green Hierarchical Vision Transformer for Masked Image ModelingGreenMIM
2022-05-26MixMIMArxiv 2022MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation LearningCode is Opening
2022-05-28SupMAEArxiv 2022SupMAE: Supervised Masked Autoencoders Are Efficient Vision LearnersSupMAE
2022-05-30HiViTArxiv 2022HiViT: Hierarchical Vision Transformer Meets Masked Image ModelingNone
2022-06-01LoMaRArxiv 2022Efficient Self-supervised Vision Pretraining with Local Masked ReconstructionLoMaR
2022-06-22SemMAENeurIPS 2022SemMAE: Semantic-Guided Masking for Learning Masked AutoencodersSemMAE
2022-08-11MILANArxiv 2022MILAN: Masked Image Pretraining on Language Assisted RepresentationMILAN
2022-11-14EVAArxiv 2022EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleEVA
2022-11-28AMTAAAI 2023Good helper is around you: Attention-driven Masked Image ModelingAMT
2023-01-03TinyMIMCVPR 2023TinyMIM: An Empirical Study of Distilling MIM Pre-trained ModelsTinyMIM
2023-03-04PixMIMArxiv 2023PixMIM: Rethinking Pixel Reconstruction in Masked Image ModelingPixMIM
2023-03-09LocalMIMCVPR 2023Masked Image Modeling with Local Multi-Scale ReconstructionLocalMIM
2023-03-12AutoMAEArxiv 2023Improving Masked Autoencoders by Learning Where to MaskAutoMAE
2023-03-15DeepMIMArxiv 2023DeepMIM: Deep Supervision for Masked Image ModelingDeepMIM
2023-04-25Img2VecArxiv 2023Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncodersNone
2023-12-30DTMArxiv 2023Masked Image Modeling via Dynamic Token MorphingNone
2024-11-24PR-MIMArxiv 2024PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image ModelingNone

Others:

Object detection.

DateMethodConferenceTitleCode
2022-04-06MIMDetArxiv 2022Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionMIMDet

3D.

DateMethodConferenceTitleCode
2021-11-29Point-BERTCVPR 2022Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point ModelingPoint-BERT
2022-03-28Point-MAEECCV 2022Masked Autoencoders for Point Cloud Self-supervised LearningPoint-MAE
2022-05-28Point-M2AENeurIPS 2022Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-trainingPoint-M2AE
2022-12-13I2P-MAECVPR 2023Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked AutoencodersI2P-MAE
2024-04-01NeRF-MAEECCV 2024NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance FieldsNeRF-MAE

Image generation.

DateMethodConferenceTitleCode
2022-02-08MaskGITArxiv 2022MaskGIT: Masked Generative Image TransformerNone

Unsupervised Domain Adaptation.

DateMethodConferenceTitleCode
2023-06-18MICCVPR 2023MIC: Masked Image Consistency for Context-Enhanced Domain AdaptationNone

Video.

DateMethodConferenceTitleCode
2021-12-02BEVTArxiv 2021BEVT: BERT Pretraining of Video TransformersBEVT
2022-03-23VideoMAENeurIPS 2022VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingVideoMAE
2022-05-18MAE_STNeurIPS 2022Masked Autoencoders As Spatiotemporal LearnersMAE_ST
2023-03-29VideoMAE v2CVPR 2023VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingNone

Multi-modal.

DateMethodConferenceTitleCode
2022-04-04MultiMAEArxiv 2022MultiMAE: Multi-modal Multi-task Masked AutoencodersMultiMAE
2022-05-27M3AEArxiv 2022Multimodal Masked Autoencoders Learn Transferable RepresentationsNone
2022-08-03xxxArxiv 2022Masked Vision and Language Modeling for Multi-modal Representation LearningNone
2022-12-01FLIPArxiv 2022Scaling Language-Image Pre-training via MaskingNone

Medical.

DateMethodConferenceTitleCode
2022-03-10MedMAEArxiv 2022Self Pre-training with Masked Autoencoders for Medical Image AnalysisNone

Analysis.

DateMethodConferenceTitle
2022-08-08RelaxMIMArxiv 2022Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Survey.

DateConferenceTitle
2022-07-30Arxiv 2022A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
2023-12-31Arxiv 2023Masked Modeling for Self-supervised Representation Learning on Vision and Beyond