This repo contains a comprehensive paper list of Model Quantization for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords. <br>

This repo is being actively updated, and contributions in any form to make this list more comprehensive are welcome. Special thanks to collaborator Zhikai Li, and all researchers who have contributed to this repo! <br>

Keywords: PTQ: post-training quantization | Non-uniform: non-uniform quantization | MP: mixed-precision quantization | Extreme: binary or ternary quantization


Transformer-based Models

Vision Transformers

Language Transformers

Convolutional Neural Networks

Visual Generation

Image Classification

Other Tasks

Object Detection

Super Resolution

Point Cloud

