Awesome

Awesome AI Papers ⭐️

Description

This repository is an up-to-date list of significant AI papers organized by publication date. It covers five fields : computer vision, natural language processing, audio processing, multimodal learning and reinforcement learning. Feel free to give this repository a star if you enjoy the work.

Maintainer: Aimerou Ndiaye

Table of Contents

2023 Papers
2022 Papers
Historical Papers

Taxonomy

To select the most relevant papers, we chose subjective limits in terms of number of citations. Each icon here designates a paper type that meets one of these criteria.

🏆 Historical Paper : more than 10k citations and a decisive impact in the evolution of AI.

⭐ Important Paper : more than 50 citations and state of the art results.

⏫ Trend : 1 to 50 citations, recent and innovative paper with growing adoption.

📰 Important Article : decisive work that was not accompanied by a research paper.

2023 Papers <a name="2023"></a>

Computer Vision <a name="2023cv"></a>

⭐ 01/2023: Muse: Text-To-Image Generation via Masked Generative Transformers (Muse)
⭐ 02/2023: Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1)
⭐ 02/2023: Scaling Vision Transformers to 22 Billion Parameters (ViT 22B)
⭐ 02/2023: Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
⭐ 03/2023: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (Visual ChatGPT)
⭐ 03/2023: Scaling up GANs for Text-to-Image Synthesis (GigaGAN)
⭐ 04/2023: Segment Anything (SAM)
⭐ 04/2023: DINOv2: Learning Robust Visual Features without Supervision (DINOv2)
⭐ 04/2023: Visual Instruction Tuning
⭐ 04/2023: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (VideoLDM)
⭐ 04/2023: Synthetic Data from Diffusion Models Improves ImageNet Classification
⭐ 04/2023: Segment Anything in Medical Images (MedSAM)
⭐ 05/2023: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (DragGAN)
⭐ 06/2023: Neuralangelo: High-Fidelity Neural Surface Reconstruction (Neuralangelo)
⭐ 07/2023: SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (SDXL)
⭐ 08/2023: 3D Gaussian Splatting for Real-Time Radiance Field Rendering
⭐ 08/2023: Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization... (Qwen-VL)
⏫ 08/2023: MVDream: Multi-view Diffusion for 3D Generation (MVDream)
⏫ 11/2023: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks (Florence-2)
⏫ 12/2023: VideoPoet: A Large Language Model for Zero-Shot Video Generation (VideoPoet)

NLP <a name="2023nlp"></a>

⭐ 01/2023: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (DetectGPT)
⭐ 02/2023: Toolformer: Language Models Can Teach Themselves to Use Tools (Toolformer)
⭐ 02/2023: LLaMA: Open and Efficient Foundation Language Models (LLaMA)
📰 03/2023: GPT-4
⭐ 03/2023: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (GPT-4 Eval)
⭐ 03/2023: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (HuggingGPT)
⭐ 03/2023: BloombergGPT: A Large Language Model for Finance (BloombergGPT)
⭐ 04/2023: Instruction Tuning with GPT-4
⭐ 04/2023: Generative Agents: Interactive Simulacra of Human (Gen Agents)
⭐ 05/2023: PaLM 2 Technical Report (PaLM-2)
⭐ 05/2023: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (ToT)
⭐ 05/2023: LIMA: Less Is More for Alignment (LIMA)
⭐ 05/2023: QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)
⭐ 05/2023: Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)
⭐ 07/2023: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (ToolLLM)
⭐ 08/2023: MetaGPT: Meta Programming for Multi-Agent Collaborative Framework (MetaGPT)
⭐ 08/2023: Code Llama: Open Foundation Models for Code (Code Llama)
⏫ 09/2023: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback (RLAIF)
⭐ 09/2023: Large Language Models as Optimizers (OPRO)
⏫ 10/2023: Eureka: Human-Level Reward Design via Coding Large Language Models (Eureka)
⏫ 12/2023: Mathematical discoveries from program search with large language models (FunSearch)

Audio Processing <a name="2023ap"></a>

⭐ 01/2023: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)
⭐ 01/2023: MusicLM: Generating Music From Text (MusicLM)
⭐ 01/2023: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models (AudioLDM)
⭐ 03/2023: Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (USM)
⭐ 05/2023: Scaling Speech Technology to 1,000+ Languages (MMS)
⏫ 06/2023: Simple and Controllable Music Generation (MusicGen)
⏫ 06/2023: AudioPaLM: A Large Language Model That Can Speak and Listen (AudioPaLM)
⏫ 06/2023: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (Voicebox)

Multimodal Learning <a name="2023ml"></a>

⭐ 02/2023: Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
⭐ 03/2023: PaLM-E: An Embodied Multimodal Language Model (PaLM-E)
⭐ 04/2023: AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (AudioGPT)
⭐ 05/2023: ImageBind: One Embedding Space To Bind Them All (ImageBind)
⏫ 07/2023: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3Leon)
⏫ 07/2023: Meta-Transformer: A Unified Framework for Multimodal Learning (Meta-Transformer)
⏫ 08/2023: SeamlessM4T: Massively Multilingual & Multimodal Machine Translation (SeamlessM4T)

Reinforcement Learning <a name="2023rl"></a>

⭐ 01/2023: Mastering Diverse Domains through World Models (DreamerV3)
⏫ 02/2023: Grounding Large Language Models in Interactive Environments with Online RL (GLAM)
⏫ 02/2023: Efficient Online Reinforcement Learning with Offline Data (RLPD)
⏫ 03/2023: Reward Design with Language Models
⭐ 05/2023: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO)
⏫ 06/2023: Faster sorting algorithms discovered using deep reinforcement learning (AlphaDev)
⏫ 08/2023: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (Retroformer)

Other Papers <a name="2023op"></a>

⭐ 02/2023: Symbolic Discovery of Optimization Algorithms (Lion)
⭐ 07/2023: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (RT-2)
⏫ 11/2023: Scaling deep learning for materials discovery (GNoME)
⏫ 12/2023: Discovery of a structural class of antibiotics with explainable deep learning

2022 Papers <a name="2022"></a>

Computer Vision <a name="2022cv"></a>

⭐ 01/2022: A ConvNet for the 2020s (ConvNeXt)
⭐ 01/2022: Patches Are All You Need (ConvMixer)
⭐ 02/2022: Block-NeRF: Scalable Large Scene Neural View Synthesis (Block-NeRF)
⭐ 03/2022: DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (DINO)
⭐ 03/2022: Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs (Large Kernel CNN)
⭐ 03/2022: TensoRF: Tensorial Radiance Fields (TensoRF)
⭐ 04/2022: MaxViT: Multi-Axis Vision Transformer (MaxViT)
⭐ 04/2022: Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
⭐ 05/2022: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen)
⭐ 05/2022: GIT: A Generative Image-to-text Transformer for Vision and Language (GIT)
⭐ 06/2022: CMT: Convolutional Neural Network Meet Vision Transformers (CMT)
⭐ 07/2022: Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors... (Swin UNETR)
⭐ 07/2022: Classifier-Free Diffusion Guidance
⭐ 08/2022: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (DreamBooth)
⭐ 09/2022: DreamFusion: Text-to-3D using 2D Diffusion (DreamFusion)
⭐ 09/2022: Make-A-Video: Text-to-Video Generation without Text-Video Data (Make-A-Video)
⭐ 10/2022: On Distillation of Guided Diffusion Models
⭐ 10/2022: LAION-5B: An open large-scale dataset for training next generation image-text models (LAION-5B)
⭐ 10/2022: Imagic: Text-Based Real Image Editing with Diffusion Models (Imagic)
⭐ 11/2022: Visual Prompt Tuning
⭐ 11/2022: Magic3D: High-Resolution Text-to-3D Content Creation (Magic3D)
⭐ 11/2022: DiffusionDet: Diffusion Model for Object Detection (DiffusionDet)
⭐ 11/2022: InstructPix2Pix: Learning to Follow Image Editing Instructions (InstructPix2Pix)
⭐ 12/2022: Multi-Concept Customization of Text-to-Image Diffusion (Custom Diffusion)
⭐ 12/2022: Scalable Diffusion Models with Transformers (DiT)

NLP <a name="2022nlp"></a>

⭐ 01/2022: LaMBDA: Language Models for Dialog Applications (LaMBDA)
⭐ 01/2022: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (CoT)
⭐ 02/2022: Competition-Level Code Generation with AlphaCode (AlphaCode)
⭐ 02/2022: Finetuned Language Models Are Zero-Shot Learners (FLAN)
⭐ 03/2022: Training language models to follow human instructions with human feedback (InstructGPT)
⭐ 03/2022: Multitask Prompted Training Enables Zero-Shot Task Generalization (T0)
⭐ 03/2022: Training Compute-Optimal Large Language Models (Chinchilla)
⭐ 04/2022: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan)
⭐ 04/2022: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (GPT-NeoX)
⭐ 04/2022: PaLM: Scaling Language Modeling with Pathways (PaLM)
⭐ 06/2022: Beyond the Imitation Game: Quantifying and extrapolating the capabilities of lang... (BIG-bench)
⭐ 06/2022: Solving Quantitative Reasoning Problems with Language Models (Minerva)
⭐ 10/2022: ReAct: Synergizing Reasoning and Acting in Language Models (ReAct)
⭐ 11/2022: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (BLOOM)
📰 11/2022: Optimizing Language Models for Dialogue (ChatGPT)
⭐ 12/2022: Large Language Models Encode Clinical Knowledge (Med-PaLM)

Audio Processing <a name="2022ap"></a>

⭐ 02/2022: mSLAM: Massively multilingual joint pre-training for speech and text (mSLAM)
⭐ 02/2022: ADD 2022: the First Audio Deep Synthesis Detection Challenge (ADD)
⭐ 03/2022: Efficient Training of Audio Transformers with Patchout (PaSST)
⭐ 04/2022: MAESTRO: Matched Speech Text Representations through Modality Matching (Maestro)
⭐ 05/2022: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language... (SpeechT5)
⭐ 06/2022: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (WavLM)
⭐ 07/2022: BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for ASR (BigSSL)
⭐ 08/2022: MuLan: A Joint Embedding of Music Audio and Natural Language (MuLan)
⭐ 09/2022: AudioLM: a Language Modeling Approach to Audio Generation (AudioLM)
⭐ 09/2022: AudioGen: Textually Guided Audio Generation (AudioGen)
⭐ 10/2022: High Fidelity Neural Audio Compression (EnCodec)
⭐ 12/2022: Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)

Multimodal Learning <a name="2022ml"></a>

⭐ 01/2022: BLIP: Boostrapping Language-Image Pre-training for Unified Vision-Language... (BLIP)
⭐ 02/2022: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and... (Data2vec)
⭐ 03/2022: VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks (VL-Adapter)
⭐ 04/2022: Winoground: Probing Vision and Language Models for Visio-Linguistic... (Winoground)
⭐ 04/2022: Flamingo: a Visual Language Model for Few-Shot Learning (Flamingo)
⭐ 05/2022: A Generalist Agent (Gato)
⭐ 05/2022: CoCa: Contrastive Captioners are Image-Text Foundation Models (CoCa)
⭐ 05/2022: VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts (VLMo)
⭐ 08/2022: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT)
⭐ 09/2022: PaLI: A Jointly-Scaled Multilingual Language-Image Model (PaLI)

Reinforcement Learning <a name="2022rl"></a>

⭐ 01/2022: Learning robust perceptive locomotion for quadrupedal robots in the wild
⭐ 02/2022: BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
⭐ 02/2022: Outracing champion Gran Turismo drivers with deep reinforcement learning (Sophy)
⭐ 02/2022: Magnetic control of tokamak plasmas through deep reinforcement learning
⭐ 08/2022: Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning (ANYmal)
⭐ 10/2022: Discovering faster matrix multiplication algorithms with reinforcement learning (AlphaTensor)

Other Papers <a name="2022op"></a>

⭐ 02/2022: FourCastNet: A Global Data-driven High-resolution Weather Model... (FourCastNet)
⭐ 05/2022: ColabFold: making protein folding accessible to all (ColabFold)
⭐ 06/2022: Measuring and Improving the Use of Graph Information in GNN
⭐ 10/2022: TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis (TimesNet)
⭐ 12/2022: RT-1: Robotics Transformer for Real-World Control at Scale (RT-1)

Historical Papers <a name="history"></a>

🏆 1958: Perceptron: A probabilistic model for information storage and organization in the brain (Perceptron)
🏆 1986: Learning representations by back-propagating errors (Backpropagation)
🏆 1986: Induction of decision trees (CART)
🏆 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (HMM)
🏆 1989: Multilayer feedforward networks are universal approximators
🏆 1992: A training algorithm for optimal margin classifiers (SVM)
🏆 1996: Bagging predictors
🏆 1998: Gradient-based learning applied to document recognition (CNN/GTN)
🏆 2001: Random Forests
🏆 2001: A fast and elitist multiobjective genetic algorithm (NSGA-II)
🏆 2003: Latent Dirichlet Allocation (LDA)
🏆 2006: Reducing the Dimensionality of Data with Neural Networks (Autoencoder)
🏆 2008: Visualizing Data using t-SNE (t-SNE)
🏆 2009: ImageNet: A large-scale hierarchical image database (ImageNet)
🏆 2012: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
🏆 2013: Efficient Estimation of Word Representations in Vector Space (Word2vec)
🏆 2013: Auto-Encoding Variational Bayes (VAE)
🏆 2014: Generative Adversarial Networks (GAN)
🏆 2014: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Dropout)
🏆 2014: Sequence to Sequence Learning with Neural Networks
🏆 2014: Neural Machine Translation by Jointly Learning to Align and Translate (RNNSearch-50)
🏆 2014: Adam: A Method for Stochastic Optimization (Adam)
🏆 2015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Cov... (BatchNorm)
🏆 2015: Going Deeper With Convolutions (Inception)
🏆 2015: Human-level control through deep reinforcement learning (Deep Q Network)
🏆 2015: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Faster R-CNN)
🏆 2015: U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net)
🏆 2015: Deep Residual Learning for Image Recognition (ResNet)
🏆 2016: You Only Look Once: Unified, Real-Time Object Detection (YOLO)
🏆 2017: Attention is All you Need (Transformer)
🏆 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)
🏆 2020: Language Models are Few-Shot Learners (GPT-3)
🏆 2020: Denoising Diffusion Probabilistic Models (DDPM)
🏆 2020: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
🏆 2021: Highly accurate protein structure prediction with AlphaFold (Alphafold)
📰 2022: ChatGPT: Optimizing Language Models For Dialogue (ChatGPT)