Awesome

Moving Object Segmentation: All You Need Is SAM (and Flow)

Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman

Visual Geometry Group, Department of Engineering Science, University of Oxford

Requirements

pytorch=2.0.0, Pillow, opencv, einops, tensorboardX

Segment Anything can be installed following the official repository here, or by

pip install git+https://github.com/facebookresearch/segment-anything.git

Datasets

Training datasets

Synthetic training data from OCLR_paper can be downloaded from here.
DAVIS2017 (and DAVIS2016) can be downloaded here.
DAVIS2017-motion has the same sequences with DAVIS2017, but the annotations are curated to cater for jointly moving objects, which can be downloaded from here.

Evaluation datasets

DAVIS datasets can be obtained following the instructions above.
YTVOS2018-motion is a subset selected from training split of YTVOS2018. These selected sequences are used for evaluation, with predominantly moving objects involved (i.e., objects can be discovered based on their motion). For more details and downloading instructions, please follow this link.
Other datasets such as SegTrackv2, FBMS-59 and MoCA_filter can be downloaded and preprocessed following the protocol described in motiongrouping.

Optical flow estimation

In this work, optical flow is estimated by RAFT, with the code provided in the flow folder.

Path configuration

The data paths can be specified in data/dataset_config.py.

Checkpoints and results

The pretrained original SAM checkpoints can be downloaded here
The pretrained flowsam model checkpoints can be downloaded here.
Our predicted masks on benchmarks datasets can be found here.

Inference

To run FlowI-SAM,

python evaluation.py --model=flowisam --dataset=dvs16 --flow_gaps=1,-1,2,-2 \
                      --max_obj=5 --num_gridside=10 --ckpt_path={} --save_path={}

To run FlowP-SAM,

python evaluation.py --model=flowpsam --dataset=dvs16 --flow_gaps=1,-1,2,-2 \
                      --max_obj=10 --num_gridside=20 --ckpt_path={} --save_path={}

where --flow_gaps denotes the frame gaps of flow inputs --max_obj indicates the maximum number of predicted object masks --num_gridside indicates the number of uniform grid point inputs (e.g., "10" correponds to 10 x 10 points) --ckpt_path specifies the model checkpoint path --save_path specifies the path to save predicted masks (if not specified, no masks will be saved)

To run the code on your own data, (or datasets without GT multi-object segmentation, e.g., SegTrackv2, FBMS-59, MoCA_filter, etc.)

Set --dataset=example, and arrange you data as the following:

{data_name}/
├── JPEGImages/
│   └── {category_name}/
│       ├── 00000.jpg
│       └── ......
├── FlowImages_gap1/
│   └── {category_name}/
│       ├── 00000.png
│       └── ......
├── ...... (More flow images)

Add you own dataset information in config_eval_dataloader() in data/dataset_config.py (under "example" dataset)

To perform sequence-level mask association (in other words, matching the identities of masks throughout the sequence) for multi-object datasets,

python seq_level_postprocess.py --dataset=dvs17m --mask_dir={} --save_path={}

For single-object cases usually the first mask of each frame would suffice.

Evaluation benchmarks:

For DAVIS2016, use the DAVIS2016 official evaluator.
For DAVIS2017, use the DAVIS2017 official evaluator.
For DAVIS2017-motion, following the evaluation protocol introduced in OCLR_paper.
For MoCA_filter, use the evaluator provided in motiongrouping.

Training

python train.py --model={} --dataset=dvs16 --model_save_path={}

where --model specifies the model to be trained (flowisam or flowpsam) --model_save_path indicates the path to save logs and model ckpts

Citation

If you find this repository helpful, please consider citing our work:

@InProceedings{xie2024flowsam,
  title={Moving Object Segmentation: All You Need Is SAM (and Flow)},
  author={Junyu Xie and Charig Yang and Weidi Xie and Andrew Zisserman},
  booktitle={ACCV},
  year={2024}
}

Reference

Segment Anything: https://github.com/facebookresearch/segment-anything