Awesome
Open Scene Understanding (ICCV ACVR 2023)
Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments [PDF]
<p align="center"> <img src="https://github.com/RuipingL/OpenSU/blob/main/img/comparison.png" width="600"> </p>Environment
# Clone this repository and navigate into the repository
git clone https://github.com/RuipingL/OpenSU.git
cd OpenSU
# Create a conda environment, activate the environment and install PyTorch via conda
conda create --name OpenSU python=3.9
conda activate OpenSU
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
# Install requirements via pip
pip install -r requirements.txt
# Install Segment Anything
pip install git+https://github.com/facebookresearch/segment-anything.git
Dataset Preparation
Download images to the folder SWiG
, and json files to the folder SWiG/SWiG_jsons
.
Model Checkpoints
Download
Swin-T to the folder ckpt/Swin
,
Segment Anything and MobileSAM to the folder ckpt/sam
, and GSR model to the folder ckpt
.
Training
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch_size 4 --dataset_file swig --epochs 40 --num_workers 4 --num_glance_enc_layers 3 --num_gaze_s1_dec_layers 3 --num_gaze_s1_enc_layers 3 --num_gaze_s2_dec_layers 3 --dropout 0.15 --hidden_dim 512 --output_dir OpenSU
Evaluation
python main.py --saved_model ckpt/OpenSU_Swin.pth --output_dir OpenSU_eva --dev # Evaluation on develpment set
python main.py --saved_model ckpt/OpenSU_Swin.pth --output_dir OpenSU_eva --test # Evaluation on test set
Demo
python demo.py --image_path img/carting_214.jpg --sam sam # Using vanilla Segment Anything as segmentation map generator
python demo.py --image_path img/carting_214.jpg --sam mobilesam # Using MobileSAM as segmentation map generator
Output:
# Text information
verb: carting
role: agent, noun: dog.n.01
role: item, noun: man.n.01
role: tool, noun: cart.n.01
role: place, noun: outdoors.n.01
the dog cartes the man in a cart at a outdoors.
<img src="https://github.com/RuipingL/OpenSU/blob/main/img/carting_214_sam.jpg" width="300">
Citation
Our system is built upon the framework of CoFormer. If you use our OpenSU, please cite
@inproceedings{liu2023opensu,
title={Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments},
author={Liu, Ruiping and Zhang, Jiaming and Peng, Kunyu and Zheng, Junwei and Cao, Ke and Chen, Yufan and Yang, Kailun and Stiefelhagen, Rainer},
booktitle={2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
year={2023}
}