Home

Awesome

PTQ4SAM: Post-Training Quantization for Segment Anything (CVPR 2024)

Chengtao Lv*, Hong Chen*, Jinyang GuoπŸ“§, Yifu Ding, Xianglong Liu

(* denotes equal contribution, πŸ“§ denotes corresponding author.)

Overview

overview Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly.

Create Environment

🍺🍺🍺 You can refer the environment.sh in the root directory or install step by step.

  1. Install PyTorch
conda create -n ptq4sam python=3.7 -y
pip install torch torchvision
  1. Install MMCV
pip install -U openmim
mim install "mmcv-full<2.0.0"
  1. Install other requirements
pip install -r requirements.txt
  1. Compile CUDA operators
cd projects/instance_segment_anything/ops
python setup.py build install
cd ../../..
  1. Install mmdet
cd mmdetection/
python3 setup.py build develop
cd ..

Prepare Dataset and Models

Download the official COCO dataset, put them into the corresponding folders of datasets/ and recollect them as the following form:

β”œβ”€β”€ data
β”‚   β”œβ”€β”€ coco
β”‚   β”‚   β”œβ”€β”€ annotations
β”‚   β”‚   β”œβ”€β”€ train2017
β”‚   β”‚   β”œβ”€β”€ val2017
β”‚   β”‚   β”œβ”€β”€ test2017

Download the pretrain weights (SAM and detectors), put them into the corresponding folders of ckpt/:

Usage

To perform quantization on models, specify the model configuration and quantization configuration. For example, to perform W6A6 quantization for SAM-B with a YOLO detector, use the following command:

python ptq4sam/solver/test_quant.py \
--config ./projects/configs/yolox/yolo_l-sam-vit-l.py \
--q_config exp/config66.yaml --quant-encoder

We recommend using a GPU with more than 40GB for experiments. If you want to visualize the prediction results, you can achieve this by specifying --show-dir. Bimodal distributions mainly occur in the mask decoder of SAM-B and SAM-L.

Reference

If you find this repo useful for your research, please consider citing the paper.

@inproceedings{lv2024ptq4sam,
  title={PTQ4SAM: Post-Training Quantization for Segment Anything},
  author={Lv, Chengtao and Chen, Hong and Guo, Jinyang and Ding, Yifu and Liu, Xianglong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15941--15951},
  year={2024}
}

Acknowledgments

The code of PTQ4SAM was based on Prompt-Segment-Anything and QDrop. We thank for their open-sourced code.