Home

Awesome

TCE-RVOS

The official implementation for the "Temporal Context Enhanced Referring Video Object Segmentation" accepted by WACV 2024

Structure

Temporal Context Enhanced Referring Video Object Segmentation<br> Xiao Hu, Basavaraj Hampiholi, Heiko Neumann, and Jochen Lang

Abstract


The goal of Referring Video Object Segmentation is to extract an object from a video clip based on a given expression. While previous methods have utilized the transformer's multi-modal learning capabilities to aggregate information from different modalities, they have mainly focused on spatial information and paid less attention to temporal information. To enhance the learning of temporal information, we propose TCE-RVOS with a novel frame token fusion (FTF) structure and a novel instance query transformer (IQT). Our technical innovations maximize the potential information gain of videos over single images. Our contributions also include a new classification of two widely used validation datasets for investigation of challenging cases.

Update


Demo


Videos

Coming Soon

Image frames

The order of the rows are 1. MTTR 2. ReferFormer 3. TCE RVOS

  1. "a white and red parachute blowing in the wind", shown in blue masks. samp1

  2. "the white toilet is between the white tub and green cabinet”, shown in purple masks. samp2

Installation & Data Preparation


Please refer to the ReferFormer.

Model Zoo


Coming Soon

Acknowledgement


This repo is based on ReferFormer. We also refer to MTTR. Thanks for their great works.