Home

Awesome

From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models

Official PyTorch implementation for our NeurIPS 2023 paper, From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models.

Qualitative results

Video presentation

output3 output2 output0 output7 output6 output5

Table of Contents

Installation

To create a repository using the env.yaml file, follow these steps:

  1. Clone this repository:
    git clone https://github.com/BGU-CS-VIL/Training-Free-VOS.git)https://github.com/BGU-CS-VIL/Training-Free-VOS.git
    
    
  2. Navigate to the repository directory:
    cd Training-Free-VOS
    
    
  3. Create an environment using the env.yaml file:
    conda env create -f env.yaml
    
    
  4. Activate the environment:
    conda activate VOS
    

Downloading DAVIS 2017 Dataset

Follow these steps to download and set up the DAVIS 2017 dataset:

  1. Download the DAVIS 2017 dataset from the following link: DAVIS 2017 TrainVal 480p

  2. Extract the downloaded file under the data folder in your project directory:

    unzip DAVIS-2017-trainval-480p.zip -d ./data/
    

Extracting Features with XCiT

We use the Cross-Covariance Image Transformer (XCiT) for feature extraction. You can find more information about XCiT here: XCiT GitHub Repository.

The pre-extracted features are available for download:

  1. Download the features from the following link: Download Features

  2. Unzip the downloaded file into the features folder in your project directory:

    unzip XCIT-feat.zip -d ./features/
    

Inference

To run the inference, use the command below. You can modify the arguments as needed:

python main_seg.py

--loc: Location scale factor. Default is 10.
--time: Time factor. Default is 0.33.
--num_models: Number of models. Default is 10.
--vis: Enable saving segmentation overlay on the image

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{uziel2023vit,
  title={From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models},
  author={Uziel, Roy and Dinari, Or and Freifeld, Oren},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}