Awesome
π₯π₯π₯Update 2023.02.19π₯π₯π₯
2022CVPR-Modeling-Motion-with-Multi-Modal-Features-for-Text-Based-Video-Segmentation
This is the code for CVPR2022 paper "Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation"
Framework
Usage
-
Download A2D-Sentences and JHMDB-Sentences. Then, please convert the raw data into image frames.
-
Please use RAFT to generate the opticla flow map (visualize in RGB format) from frame t to frame t+1. Since there are only a few frames annotated in A2D and JHMDB, we only need to generate optical flow maps for these frames.
-
Put them as follows:
your dataset dir/
βββ A2D/
βββ allframes/
βββ allframes_flow/
βββ Annotations_visualize
βββ a2d_txt
βββtrain.txt
βββtest.txt
βββ J-HMDB/
βββ allframes/
βββ allframes_flow/
βββ Annotations_visualize
βββ jhmdb_txt
βββtrain.txt
βββtest.txt
"Annotations_visualize" contains the GT masks for each target object. We have upload them to BaiduPan(lo50) for convenience.
Citation
Please consider to cite our work in your publications if you are interest in our research:
@inproceedings{zhao2022modeling,
title={Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation},
author={Zhao, Wangbo and Wang, Kai and Chu, Xiangxiang and Xue, Fuzhao and Wang, Xinchao and You, Yang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11737--11746},
year={2022}
}