Home

Awesome

awesome-MIM

Reading list for research topics in Masked Image Modeling(MIM).

We list the most popular methods for MIM, if we missed something, please submit a request. (Note: We show the date the first edition of the paper was submitted to arxiv, but the link to the paper may be up to date.)

Backbone models.

DateMethodConferenceTitleCode
2020-xx-xx(maybe 2019)iGPTICML 2020Generative Pretraining from PixelsiGPT
2020-10-22ViTICLR 2021 (Oral)An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleViT
2021-04-08SiTArxiv 2021SiT: Self-supervised vIsion TransformerNone
2021-06-10MSTNeurIPS 2021MST: Masked Self-Supervised Transformer for Visual RepresentationNone
2021-06-14BEiTICLR 2022 (Oral)BEiT: BERT Pre-Training of Image TransformersBEiT
2021-11-11MAEArxiv 2021Masked Autoencoders Are Scalable Vision LearnersMAE
2021-11-15iBoTICLR 2022iBOT: Image BERT Pre-Training with Online TokenizeriBoT
2021-11-18SimMIMArxiv 2021SimMIM: A Simple Framework for Masked Image ModelingSimMIM
2021-11-24PeCoArxiv 2021PeCo: Perceptual Codebook for BERT Pre-training of Vision TransformersNone
2021-11-30MC-SSL0.0Arxiv 2021MC-SSL0.0: Towards Multi-Concept Self-Supervised LearningNone
2021-12-16MaskFeatArxiv 2021Masked Feature Prediction for Self-Supervised Visual Pre-TrainingNone
2021-12-20SplitMaskArxiv 2021Are Large-scale Datasets Necessary for Self-Supervised Pre-training?None
2022-01-31ADIOSArxiv 2022Adversarial Masking for Self-Supervised LearningNone
2022-02-07CAEArxiv 2022Context Autoencoder for Self-Supervised Representation LearningCAE
2022-02-07CIMArxiv 2022Corrupted Image Modeling for Self-Supervised Visual Pre-TrainingNone
2022-03-10MVPArxiv 2022MVP: Multimodality-guided Visual Pre-trainingNone
2022-03-23AttMaskECCV 2022What to Hide from Your Students: Attention-Guided Masked Image ModelingAttMask
2022-03-29mc-BEiTArxiv 2022mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingNone
2022-04-18Ge2-AEArxiv 2022The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-TrainingNone
2022-05-08MCMAENeurIPS 2022MCMAE: Masked Convolution Meets Masked AutoencodersMCMAE
2022-05-20UM-MAEArxiv 2022Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with LocalityUM-MAE
2022-05-26GreenMIMArxiv 2022Green Hierarchical Vision Transformer for Masked Image ModelingGreenMIM
2022-05-26MixMIMArxiv 2022MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation LearningCode is Opening
2022-05-28SupMAEArxiv 2022SupMAE: Supervised Masked Autoencoders Are Efficient Vision LearnersSupMAE
2022-05-30HiViTArxiv 2022HiViT: Hierarchical Vision Transformer Meets Masked Image ModelingNone
2022-06-01LoMaRArxiv 2022Efficient Self-supervised Vision Pretraining with Local Masked ReconstructionLoMaR
2022-06-22SemMAENeurIPS 2022SemMAE: Semantic-Guided Masking for Learning Masked AutoencodersSemMAE
2022-08-11MILANArxiv 2022MILAN: Masked Image Pretraining on Language Assisted RepresentationMILAN
2022-11-14EVAArxiv 2022EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleEVA
2022-11-28AMTArxiv 2022Good helper is around you: Attention-driven Masked Image ModelingAMT
2023-01-03TinyMIMCVPR 2023TinyMIM: An Empirical Study of Distilling MIM Pre-trained ModelsTinyMIM
2023-03-04PixMIMArxiv 2023PixMIM: Rethinking Pixel Reconstruction in Masked Image ModelingPixMIM
2023-03-09LocalMIMCVPR 2023Masked Image Modeling with Local Multi-Scale ReconstructionLocalMIM
2023-03-12AutoMAEArxiv 2023Improving Masked Autoencoders by Learning Where to MaskAutoMAE
2023-03-15DeepMIMArxiv 2023DeepMIM: Deep Supervision for Masked Image ModelingDeepMIM
2023-04-25Img2VecArxiv 2023Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncodersNone
2023-12-30DTMArxiv 2023Masked Image Modeling via Dynamic Token MorphingNone

Others:

Object detection.

DateMethodConferenceTitleCode
2022-04-06MIMDetArxiv 2022Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionMIMDet

3D.

DateMethodConferenceTitleCode
2021-11-29Point-BERTCVPR 2022Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point ModelingPoint-BERT
2022-03-28Point-MAEECCV 2022Masked Autoencoders for Point Cloud Self-supervised LearningPoint-MAE
2022-05-28Point-M2AENeurIPS 2022Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-trainingPoint-M2AE
2022-12-13I2P-MAECVPR 2023Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked AutoencodersI2P-MAE

Image generation.

DateMethodConferenceTitleCode
2022-02-08MaskGITArxiv 2022MaskGIT: Masked Generative Image TransformerNone

Unsupervised Domain Adaptation.

DateMethodConferenceTitleCode
2023-06-18MICCVPR 2023MIC: Masked Image Consistency for Context-Enhanced Domain AdaptationNone

Video.

DateMethodConferenceTitleCode
2021-12-02BEVTArxiv 2021BEVT: BERT Pretraining of Video TransformersBEVT
2022-03-23VideoMAENeurIPS 2022VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingVideoMAE
2022-05-18MAE_STNeurIPS 2022Masked Autoencoders As Spatiotemporal LearnersMAE_ST

Multi-modal.

DateMethodConferenceTitleCode
2022-04-04MultiMAEArxiv 2022MultiMAE: Multi-modal Multi-task Masked AutoencodersMultiMAE
2022-05-27M3AEArxiv 2022Multimodal Masked Autoencoders Learn Transferable RepresentationsNone
2022-08-03xxxArxiv 2022Masked Vision and Language Modeling for Multi-modal Representation LearningNone
2022-12-01FLIPArxiv 2022Scaling Language-Image Pre-training via MaskingNone

Medical.

DateMethodConferenceTitleCode
2022-03-10MedMAEArxiv 2022Self Pre-training with Masked Autoencoders for Medical Image AnalysisNone

Analysis.

DateMethodConferenceTitle
2022-08-08RelaxMIMArxiv 2022Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Survey.

DateConferenceTitle
2022-07-30Arxiv 2022A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
2023-12-31Arxiv 2023Masked Modeling for Self-supervised Representation Learning on Vision and Beyond