Home

Awesome

<img src='source/UFO.png'>

A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection

PWC PWC PWC PWC

PWC PWC PWC PWC

Update

08/09/2022 Video Inpainting script and model coming soon!

22/07/2022 Add demo to Huggingface Spaces with Gradio.

Paper LinkHuggingface Demo
[paper]Hugging Face Spaces

UFO is a simple and Unified framework for addressing Co-Object Segmentation tasks: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection. Humans tend to mine objects by learning from a group of images or a several frames of video since we live in a dynamic world. In computer vision area, many researches focus on co-segmentation (CoS), co-saliency detection (CoSD) and video salient object detection (VSOD) to discover the co-occurrent objects. However, previous approaches design different networks on these tasks separately, which lower the upper bound on the ease of use of deep learning frameworks. In this paper, we introduce a unified framework to tackle these issues, term as <b>UFO</b> (<b>U</b>nified <b>F</b>ramework for Co-<b>O</b>bject Segmentation). All tasks share the same framework.

Task & Framework

<img src="source/fig1.gif" width="50%"/><img src='source/framework.png' width="50%">

Usage

Requirement

torch >= 1.7.0
torchvision >= 0.7.0
python3

Training

Training on group-based images. We use COCO2017 train set with the provided group split dict.npy.

python main.py

Training on video (w/o flow) . We load the weight pre-trained on the static image dataset, and use DAVIS and FBMS to train our framework.

python finetune.py --model=models/image_best.pth --use_flow=False

Training on video (w/ flow). The same as above, then we use DAVIS_flow and FBMS_flow to train our network.

python finetune.py --model=models/image_best.pth --use_flow=True

Inference

Generate the image results [checkpoint]

python test.py --model=models/image_best.pth --data_path=CoSdatasets/MSRC7/ --output_dir=CoS_results/MSRC7 --task=CoS_CoSD

Generate the video results [checkpoint]

python test.py --model=models/video_best.pth --data_path=VSODdatasets/DAVIS/ --output_dir=VSOD_results/wo_optical_flow/DAVIS --task=VSOD

Generate the video results with optical flow [checkpoint]

python test.py --model=models/video_flow_best.pth --data_path=VSODdatasets/DAVIS_flow/ --output_dir=VSOD_results/w_optical_flow/w_optical_flow --use_flow=True --task=VSOD

Evaluation

Result

<img src='source/result1.png'> <img src='source/result2.png'>

<img src="source/drift-straight.gif" width="45%"/> <img src="source/bmx-trees.gif" width="45%"/>

<img src="source/bear_480p.gif" width="45%"/><img src="source/rabbit_480p.gif" width="45%"/>

<img src='source/inpainting.gif'>

Demo

python demo.py --data_path=./demo_mp4/video/kobe.mp4 --output_dir=./demo_mp4/result

https://user-images.githubusercontent.com/50760123/156528285-59b0a056-fb07-4c1e-8e66-cae31dc0e789.mp4

bash demo_bullet_chat.sh

https://user-images.githubusercontent.com/50760123/156924040-c329075f-1d50-41cd-a869-885b2f33d873.mp4

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@misc{2203.04708,
Author = {Yukun Su and Jingliang Deng and Ruizhou Sun and Guosheng Lin and Qingyao Wu},
Title = {A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection},
Year = {2022},
Eprint = {arXiv:2203.04708},
}

@article{su2023unified,
title = {A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection},
author = {Yukun Su and Jingliang Deng and Ruizhou Sun and Guosheng Lin and Qingyao Wu},
journal = {IEEE Transactions on Multimedia},
year = {2023},
publisher = {IEEE}
}

Acknowledgement

Our project references the codes in the following repos.