Awesome
TCE-RVOS
The official implementation for the "Temporal Context Enhanced Referring Video Object Segmentation" accepted by WACV 2024
Temporal Context Enhanced Referring Video Object Segmentation<br> Xiao Hu, Basavaraj Hampiholi, Heiko Neumann, and Jochen Lang
Abstract
The goal of Referring Video Object Segmentation is to extract an object from a video clip based on a given expression. While previous methods have utilized the transformer's multi-modal learning capabilities to aggregate information from different modalities, they have mainly focused on spatial information and paid less attention to temporal information. To enhance the learning of temporal information, we propose TCE-RVOS with a novel frame token fusion (FTF) structure and a novel instance query transformer (IQT). Our technical innovations maximize the potential information gain of videos over single images. Our contributions also include a new classification of two widely used validation datasets for investigation of challenging cases.
Update
- (2023/11/19) Code researsed.💥
- (2023/10/24) TCE RVOS is accepted by WACV2024.🏄
Demo
Videos
Coming Soon
Image frames
The order of the rows are 1. MTTR 2. ReferFormer 3. TCE RVOS
-
"a white and red parachute blowing in the wind", shown in blue masks.
-
"the white toilet is between the white tub and green cabinet”, shown in purple masks.
Installation & Data Preparation
Please refer to the ReferFormer.
Model Zoo
Coming Soon
Acknowledgement
This repo is based on ReferFormer. We also refer to MTTR. Thanks for their great works.