2020-xx-xx(maybe 2019) | iGPT | ICML 2020 | Generative Pretraining from Pixels | iGPT |
2020-10-22 | ViT | ICLR 2021 (Oral) | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ViT |
2021-04-08 | SiT | Arxiv 2021 | SiT: Self-supervised vIsion Transformer | None |
2021-06-10 | MST | NeurIPS 2021 | MST: Masked Self-Supervised Transformer for Visual Representation | None |
2021-06-14 | BEiT | ICLR 2022 (Oral) | BEiT: BERT Pre-Training of Image Transformers | BEiT |
2021-11-11 | MAE | Arxiv 2021 | Masked Autoencoders Are Scalable Vision Learners | MAE |
2021-11-15 | iBoT | ICLR 2022 | iBOT: Image BERT Pre-Training with Online Tokenizer | iBoT |
2021-11-18 | SimMIM | Arxiv 2021 | SimMIM: A Simple Framework for Masked Image Modeling | SimMIM |
2021-11-24 | PeCo | Arxiv 2021 | PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers | None |
2021-11-30 | MC-SSL0.0 | Arxiv 2021 | MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning | None |
2021-12-16 | MaskFeat | Arxiv 2021 | Masked Feature Prediction for Self-Supervised Visual Pre-Training | None |
2021-12-20 | SplitMask | Arxiv 2021 | Are Large-scale Datasets Necessary for Self-Supervised Pre-training? | None |
2022-01-31 | ADIOS | Arxiv 2022 | Adversarial Masking for Self-Supervised Learning | None |
2022-02-07 | CAE | Arxiv 2022 | Context Autoencoder for Self-Supervised Representation Learning | CAE |
2022-02-07 | CIM | Arxiv 2022 | Corrupted Image Modeling for Self-Supervised Visual Pre-Training | None |
2022-03-10 | MVP | Arxiv 2022 | MVP: Multimodality-guided Visual Pre-training | None |
2022-03-23 | AttMask | ECCV 2022 | What to Hide from Your Students: Attention-Guided Masked Image Modeling | AttMask |
2022-03-29 | mc-BEiT | Arxiv 2022 | mc-BEiT: Multi-choice Discretization for Image BERT Pre-training | None |
2022-04-18 | Ge2-AE | Arxiv 2022 | The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training | None |
2022-05-08 | MCMAE | NeurIPS 2022 | MCMAE: Masked Convolution Meets Masked Autoencoders | MCMAE |
2022-05-20 | UM-MAE | Arxiv 2022 | Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality | UM-MAE |
2022-05-26 | GreenMIM | Arxiv 2022 | Green Hierarchical Vision Transformer for Masked Image Modeling | GreenMIM |
2022-05-26 | MixMIM | Arxiv 2022 | MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning | Code is Opening |
2022-05-28 | SupMAE | Arxiv 2022 | SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners | SupMAE |
2022-05-30 | HiViT | Arxiv 2022 | HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling | None |
2022-06-01 | LoMaR | Arxiv 2022 | Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction | LoMaR |
2022-06-22 | SemMAE | NeurIPS 2022 | SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders | SemMAE |
2022-08-11 | MILAN | Arxiv 2022 | MILAN: Masked Image Pretraining on Language Assisted Representation | MILAN |
2022-11-14 | EVA | Arxiv 2022 | EVA: Exploring the Limits of Masked Visual Representation Learning at Scale | EVA |
2022-11-28 | AMT | Arxiv 2022 | Good helper is around you: Attention-driven Masked Image Modeling | AMT |
2023-01-03 | TinyMIM | CVPR 2023 | TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models | TinyMIM |
2023-03-04 | PixMIM | Arxiv 2023 | PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling | PixMIM |
2023-03-09 | LocalMIM | CVPR 2023 | Masked Image Modeling with Local Multi-Scale Reconstruction | LocalMIM |
2023-03-12 | AutoMAE | Arxiv 2023 | Improving Masked Autoencoders by Learning Where to Mask | AutoMAE |
2023-03-15 | DeepMIM | Arxiv 2023 | DeepMIM: Deep Supervision for Masked Image Modeling | DeepMIM |
2023-04-25 | Img2Vec | Arxiv 2023 | Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders | None |
2023-12-30 | DTM | Arxiv 2023 | Masked Image Modeling via Dynamic Token Morphing | None |