Home

Awesome

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Pytorch implementation for paper LiVOS: Light Video Object Segmentation with Gated Linear Matching, arXiv 2024. <br>

Qin Liu<sup>1</sup>, Jianfeng Wang<sup>2</sup>, Zhengyuan Yang<sup>2</sup>, Linjie Li<sup>2</sup>, Kevin Lin<sup>2</sup>, Marc Niethammer<sup>1</sup>, Lijuan Wang<sup>2</sup> <br> <sup>1</sup>UNC-Chapel Hill, <sup>2</sup>Microsoft

Paper

<p align="center"> <img src="./docs/livos_framework.png" alt="drawing", height="360"/> </p>

Installation

The code is tested with python=3.10, torch=2.4.0, torchvision=0.19.0.

git clone https://github.com/uncbiag/LiVOS
cd LiVOS

Create a new conda environment and install required packages accordingly.

conda create -n livos python=3.10
conda activate livos
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Weights

Download the model weights and store them in the ./weights directory. The directory will be automatically created if it does not already exist.

python ./download.py

Datasets

DatasetDescriptionDownload Link
DAVIS 201760 videos (train); 30 videos (val); 30 videos (test)official site
YouTube VOS 20193471 videos (train); 507 videos (val)official site
MOSE1507 videos (train); 311 videos (val)official site
LVOS (v1)*50 vidoes (val); 50 videos (test)official site

(*) To prepare LVOS, you need to extract only the first annotations for its validation set:

python scripts/data/preprocess_lvos.py ../LVOS/valid/Annotations ../LVOS/valid/Annotations_first_only

Prepare the datasets in the following structure:

├── LiVOS (codebase)
├── DAVIS
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── YouTube
│   ├── all_frames
│   │   └── valid_all_frames
│   ├── train
│   └── valid
├── LVOS
│   ├── valid
│   │   ├──Annotations
│   │   └── ...
│   └── test
│       ├──Annotations
│       └── ...
└── MOSE
    ├── JPEGImages
    └── Annotations

Evaluation

You should get the following results using our provided models:

<table> <thead align="center"> <tr> <th rowspan="2"><span style="font-weight:bold">Training</span><br><span style="font-weight:bold">Dataset</span></th> <th rowspan="2">Model</th> <th colspan="6">J&F</th> </tr> <tr> <td>MOSE</td> <td>DAVIS-17 val</td> <td>DAVIS-17 test</td> <td>YTVOS-19 val</td> <td>LVOS val</td> <td>LVOS test</td> </tr> </thead> <tbody align="center"> <tr> <td rowspan="1">D17+YT19</td> <td align="left"><a href="https://drive.google.com/uc?export=download&id=1tG_BxCTWp_o9YH0vBqZqLC9KBsEGSsaH">livos-nomose-480p (135 MB)</a></td> <td>59.2</td> <td>84.4</td> <td>78.2</td> <td>79.9</td> <td>50.6</td> <td>44.6</td> </tr> <tr> <td rowspan="1">D17+YT19</td> <td align="left"><a href="https://drive.google.com/uc?export=download&id=1ToIDo6PIYF7lQGfO4F7HuHneyatKGWnx">livos-nomose-ft-480p (135 MB)</a></td> <td>58.4</td> <td>85.1</td> <td>81.0</td> <td>81.3</td> <td>51.2</td> <td>50.9</td> </tr> <tr> <td rowspan="1">D17+YT19+MOSE</td> <td align="left"><a href="https://drive.google.com/uc?export=download&id=13FVuxcEwNRfY70PA3O9pOyPO7Gx7Zl5N">livos-wmose-480p (135 MB)</a></td> <td>64.8</td> <td>84.0</td> <td>79.6</td> <td>82.6</td> <td>51.2</td> <td>47.0</td> </tr> </tbody> </table>
  1. To run the evaluation:
python livos/eval.py dataset=[dataset] weights=[path to model file]

Example for DAVIS 2017 validation set (more dataset options in livos/config/eval_config.yaml):

python livos/eval.py dataset=d17-val weights=./weights/livos-nomose-480p.pth
  1. To get quantitative results for DAVIS 2017 validation:
GT_DIR=../DAVIS/2017/trainval/Annotations/480p
Seg_DIR=./results/d17-val/Annotations
python ./vos-benchmark/benchmark.py -g ${GT_DIR} -m ${Seg_DIR}
  1. For results on other datasets,

Training

We conducted the training on four A6000 48GB GPUs. Without MOSE, the process required approximately 90 hours to complete 125,000 iterations.

OMP_NUM_THREADS=4 torchrun --master_port 25350 --nproc_per_node=4 livos/train.py exp_id=first_try model=base data=base

Citation

@article{liu2024livos,
  title={LiVOS: Lite Video Object Segmentation with Gated Linear Matching},
  author={Liu, Qin and Wang, Jianfeng and Yang, Zhengyuan and Li, Linjie and Lin, Kevin and Niethammer, Marc and Wang, lijuan},
  journal={arXiv preprint arXiv:2411.02818},
  year={2024}
}

Acknowledgement

Our project is developed based on Cutie. We appreciate the well-maintained codebase.