Awesome

From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models

Official PyTorch implementation for our NeurIPS 2023 paper, From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models.

Qualitative results

Video presentation

output3 output2 output0 output7 output6 output5

Installation
Downloading DAVIS 2017 Dataset
Extracting Features with XCiT
Inference
Citation

Installation

To create a repository using the env.yaml file, follow these steps:

Clone this repository:

git clone https://github.com/BGU-CS-VIL/Training-Free-VOS.git)https://github.com/BGU-CS-VIL/Training-Free-VOS.git

Navigate to the repository directory:
```
cd Training-Free-VOS
```
Create an environment using the env.yaml file:
```
conda env create -f env.yaml
```
Activate the environment:
```
conda activate VOS
```

Downloading DAVIS 2017 Dataset

Follow these steps to download and set up the DAVIS 2017 dataset:

Download the DAVIS 2017 dataset from the following link: DAVIS 2017 TrainVal 480p
Extract the downloaded file under the data folder in your project directory:
```
unzip DAVIS-2017-trainval-480p.zip -d ./data/
```

Extracting Features with XCiT

We use the Cross-Covariance Image Transformer (XCiT) for feature extraction. You can find more information about XCiT here: XCiT GitHub Repository.

The pre-extracted features are available for download:

Download the features from the following link: Download Features
Unzip the downloaded file into the features folder in your project directory:
```
unzip XCIT-feat.zip -d ./features/
```

Inference

To run the inference, use the command below. You can modify the arguments as needed:

python main_seg.py

--loc: Location scale factor. Default is 10.
--time: Time factor. Default is 0.33.
--num_models: Number of models. Default is 10.
--vis: Enable saving segmentation overlay on the image

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{uziel2023vit,
  title={From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models},
  author={Uziel, Roy and Dinari, Or and Freifeld, Oren},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}