Home

Awesome

Efficient LLM and Multimodal Foundation Model Survey

This repo contains the paper list and figures for A Survey of Resource-efficient LLM and Multimodal Foundation Models.

Abstract

Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of these large models in a scalable and environmentally sustainable way, there has been a considerable focus on developing resource-efficient strategies. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects. It offers a comprehensive analysis and valuable insights gleaned from existing literature, encompassing a broad array of topics from cutting-edge model architectures and training/serving algorithms to practical system designs and implementations. The goal of this survey is to provide an overarching understanding of how current approaches are tackling the resource challenges posed by large foundation models and to potentially inspire future breakthroughs in this field.

Scope and rationales

The scope of this survey is mainly defined by following aspects.

Screenshot

Citation

@article{xu2024a,
    title = {A Survey of Resource-efficient LLM and Multimodal Foundation Models},
    author = {Xu, Mengwei and Yin, Wangsong and Cai, Dongqi and Yi, Rongjie
    and Xu, Daliang and Wang, Qipeng and Wu, Bingyang and Zhao, Yihao and Yang, Chen
    and Wang, Shihe and Zhang, Qiyang and Lu, Zhenyan and Zhang, Li and Wang, Shangguang
    and Li, Yuanchun, and Liu Yunxin and Jin, Xin and Liu, Xuanzhe},
    journal={arXiv preprint arXiv:2401.08092},
    year = {2024}
}

Contribute

If we leave out any important papers, please let us know in the Issues and we will include them in the next version.

We will actively maintain the survey and the Github repo.

Table of Contents

Foundation Model Overview

Language Foundation Models

Vision Foundation Models

Multimodal Large FMs

Resource-efficient Architectures

Efficient Attention

Dynamic Neural Network

Diffusion-specific Optimization

ViT-specific Optimizations

Resource-efficient Algorithms

Pre-training Algorithms

Finetuning Algorithms

Inference Algorithms

Model Compression

Resource-efficient Systems

Distributed Training

Federated Learning

Serving on Cloud

Serving on Edge