Home

Awesome

Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

<p align="center"> <img src="https://i.imgur.com/waxVImv.png" alt="Oryx Video-ChatGPT"> </p>

Zhi Cai<sup>1,2</sup>, Yingjie Gao<sup>1,2</sup>, Yaoyan Zheng<sup>1,2</sup>, Nan Zhou<sup>1,2</sup> and Di Huang<sup>1,2</sup>

<sup>1</sup>SCSE Beihang University, <sup>2</sup>IRIP Lab Beihang University

paper


šŸ“¢ Latest Updates


Overview

Crowd-SAM is a novel few-shot object detection and segmentation method designed to handle crowded scenes. Generally, object detection requires extensive labels for training, which is quite time-consuming, especially in crowded scenes. In this work, We combine SAM with the specifically designed efficient prompt sampler and a mask selection PWD-Net to achieve fast and accurate pedestrian detection! Crowd-SAM achieves 78.4% AP on the Crowd-Human benchmark with 10 supporting images which is comparable to supervised detectors.

PDF Page

Installation

We recommend to use virtual enviroment, e.g. Conda, for installation:

  1. Create virtual environment:

    conda create -n crowdsam python=3.8
    
  2. Clone this repository:

    git clone https://github.com/yourusername/crowd-sam.git
    cd crowdsam
    pip install -r requirements.txt
    git submodule update --init --recursive
    pip install .
    
  3. Download DINOv2(Vit-L) checkpoint SAM(ViT-L) checkpoint.

    Place the donwdloaded weights in the weights directory. If it does not exist, use command mkdir weights to create one.

Data Preparation

1. CrowdHuman

Download the CrowdHuman dataset from the official website. Note that we only need the CrowdHuman_val.zip and annotation_val.odgt. For training data, we have prepared it in the crowdhuman_train directory and please copy the files into ./dataset/crowdhuman before training. Extract and place the downdloaded zip files in the dataset directory and it should look like this:

crowdsam/
ā”œā”€ā”€ dataset/
ā”‚   ā””ā”€ā”€ crowdhuman/
ā”‚       ā”œā”€ā”€ annotation_val.odgt
ā”‚       ā”œā”€ā”€ Images
ā””ā”€ā”€ ...

Run the script to convert odgt file to json file.

python tools/crowdhuman2coco.py -o annotation_val.odgt -v -s val_visible.json -d dataset/crowdhuman

How to use

  1. To start training the model, run the following command:
python train.py --config_file ./configs/config.yaml

Our model configs are written with yaml in the configs directory. Make sure to update the config.yaml file with the appropriate paths and parameters as needed.

We prepare a pretrained adapter weights for CrowdHuman here

  1. To evaluate the model, we recommend to use the following command for batch evaluation:
python tools/batch_eval.py --config_file ./configs/config.yaml -n num_gpus
  1. To visualize the outputs, use the following command:
python tools/test.py --config_file ./configs/config.yaml --visualize
  1. To run demo on your images, use the following command:
python tools/demo.py --config_file ./configs/config.yaml --input target_directory

This will run the evaluation script on the test dataset and output the results.

šŸ“Š Qualitative Results

qual

<!-- ![demo2](figures/demo_2.jpg) -->

Acknowlegement

We build our project based on the segment-anything and dinov2.

šŸ“œ Citation

You can cite our paper with such bibtex:

@inproceedings{cai2024crowd,
  title={Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes},
  author={Cai, Zhi and Gao, Yingjie and Zheng, Yaoyan and Zhou, Nan and Huang, Di},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2024}
}