Awesome

PTQ4SAM: Post-Training Quantization for Segment Anything (CVPR 2024)

Chengtao Lv*, Hong Chen*, Jinyang Guo📧, Yifu Ding, Xianglong Liu

(* denotes equal contribution, 📧 denotes corresponding author.)

Overview

overview Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly.

Create Environment

🍺🍺🍺 You can refer the environment.sh in the root directory or install step by step.

Install PyTorch

conda create -n ptq4sam python=3.7 -y
pip install torch torchvision

Install MMCV

pip install -U openmim
mim install "mmcv-full<2.0.0"

Install other requirements

pip install -r requirements.txt

Compile CUDA operators

cd projects/instance_segment_anything/ops
python setup.py build install
cd ../../..

Install mmdet

cd mmdetection/
python3 setup.py build develop
cd ..

Prepare Dataset and Models

Download the official COCO dataset, put them into the corresponding folders of datasets/ and recollect them as the following form:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017

Download the pretrain weights (SAM and detectors), put them into the corresponding folders of ckpt/:

sam_b: ViT-B SAM
sam_l: ViT-L SAM
sam_h: ViT-H SAM
faster rcnn: R-50-FPN Faster R-CNN
yolox: YOLOX-l
detr: H-Deformable-DETR
dino: DINO

Usage

To perform quantization on models, specify the model configuration and quantization configuration. For example, to perform W6A6 quantization for SAM-B with a YOLO detector, use the following command:

python ptq4sam/solver/test_quant.py \
--config ./projects/configs/yolox/yolo_l-sam-vit-l.py \
--q_config exp/config66.yaml --quant-encoder

yolo_l-sam-vit-l.py: configuration file for the SAM-B model with YOLO detector.
config66.yaml: configuration file for W6A6 quantization.
quant-encoder: quant the encoder of SAM.

We recommend using a GPU with more than 40GB for experiments. If you want to visualize the prediction results, you can achieve this by specifying --show-dir. Bimodal distributions mainly occur in the mask decoder of SAM-B and SAM-L.

Reference

If you find this repo useful for your research, please consider citing the paper.

@inproceedings{lv2024ptq4sam,
  title={PTQ4SAM: Post-Training Quantization for Segment Anything},
  author={Lv, Chengtao and Chen, Hong and Guo, Jinyang and Ding, Yifu and Liu, Xianglong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15941--15951},
  year={2024}
}

Acknowledgments

The code of PTQ4SAM was based on Prompt-Segment-Anything and QDrop. We thank for their open-sourced code.