Awesome
Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes
<p align="center"> <img src="https://i.imgur.com/waxVImv.png" alt="Oryx Video-ChatGPT"> </p>Zhi Cai<sup>1,2</sup>, Yingjie Gao<sup>1,2</sup>, Yaoyan Zheng<sup>1,2</sup>, Nan Zhou<sup>1,2</sup> and Di Huang<sup>1,2</sup>
<sup>1</sup>SCSE Beihang University, <sup>2</sup>IRIP Lab Beihang University
š¢ Latest Updates
- Aug-1-24: We open source the code, models.š„š„
- Jul-20-24: Crowd-SAM paper is released arxiv link. š„š„
- Jul-1-24: Crowd-SAM has been accepted to ECCV-24 š.
Overview
Crowd-SAM is a novel few-shot object detection and segmentation method designed to handle crowded scenes. Generally, object detection requires extensive labels for training, which is quite time-consuming, especially in crowded scenes. In this work, We combine SAM with the specifically designed efficient prompt sampler and a mask selection PWD-Net to achieve fast and accurate pedestrian detection! Crowd-SAM achieves 78.4% AP on the Crowd-Human benchmark with 10 supporting images which is comparable to supervised detectors.
Installation
We recommend to use virtual enviroment, e.g. Conda, for installation:
-
Create virtual environment:
conda create -n crowdsam python=3.8
-
Clone this repository:
git clone https://github.com/yourusername/crowd-sam.git cd crowdsam pip install -r requirements.txt git submodule update --init --recursive pip install .
-
Download DINOv2(Vit-L) checkpoint SAM(ViT-L) checkpoint.
Place the donwdloaded weights in the weights directory. If it does not exist, use command
mkdir weights
to create one.
Data Preparation
1. CrowdHuman
Download the CrowdHuman dataset from the official website. Note that we only need the CrowdHuman_val.zip and annotation_val.odgt. For training data, we have prepared it in the crowdhuman_train directory and please copy the files into ./dataset/crowdhuman before training.
Extract and place the downdloaded zip files in the dataset
directory and it should look like this:
crowdsam/
āāā dataset/
ā āāā crowdhuman/
ā āāā annotation_val.odgt
ā āāā Images
āāā ...
Run the script to convert odgt file to json file.
python tools/crowdhuman2coco.py -o annotation_val.odgt -v -s val_visible.json -d dataset/crowdhuman
How to use
- To start training the model, run the following command:
python train.py --config_file ./configs/config.yaml
Our model configs are written with yaml in the configs directory. Make sure to update the config.yaml
file with the appropriate paths and parameters as needed.
We prepare a pretrained adapter weights for CrowdHuman here
- To evaluate the model, we recommend to use the following command for batch evaluation:
python tools/batch_eval.py --config_file ./configs/config.yaml -n num_gpus
- To visualize the outputs, use the following command:
python tools/test.py --config_file ./configs/config.yaml --visualize
- To run demo on your images, use the following command:
python tools/demo.py --config_file ./configs/config.yaml --input target_directory
This will run the evaluation script on the test dataset and output the results.
š Qualitative Results
<!-- ![demo2](figures/demo_2.jpg) -->Acknowlegement
We build our project based on the segment-anything and dinov2.
š Citation
You can cite our paper with such bibtex:
@inproceedings{cai2024crowd,
title={Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes},
author={Cai, Zhi and Gao, Yingjie and Zheng, Yaoyan and Zhou, Nan and Huang, Di},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024}
}