Home

Awesome

🎡PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

PolyphonicFormer is the winner method of the ICCV-2021 SemKITTI-DVPS Challenge.

PolyphonicFormer is accepted by ECCV '22, Tel Aviv, Israel.

PWC

PWC

Haobo Yuan*, Xiangtai Li*, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, Dacheng Tao.

[pdf] [supp] [arxiv] [code] [poster]

Demo

Demo1

Demo2

Installation (Optional)

You do not need to install the environment if you have docker in your environment. We already put the pre-built docker image on docker hub. If you want to build the docker image by yourself, please run the following command in scripts/docker_env.

docker build -t polyphonicformer:release . --network=host

Please refer to the dockerfile for environment details if you insist on using conda.

Datasets Preparation

You can download the Cityscapes-DVPS datasets here, and SemKITTI-DVPS datasets here. Suppose your path to datasets is DATALOC, please extract the zip file and make sure the datasets folder looks like this:

DATALOC
|── cityscapes-dvps
β”‚   β”œβ”€β”€ video_sequence
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_munster_000105_000004_leftImg8bit.png
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_munster_000105_000004_gtFine_instanceTrainIds.png
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_munster_000105_000004_depth.png
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ val
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
|── semkitti-dvps
β”‚   β”œβ”€β”€ video_sequence
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_leftImg8bit.png
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_gtFine_class.png
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_gtFine_instance.png
β”‚   β”‚   β”‚   β”œβ”€β”€ 000000_000000_depth_718.8560180664062.png
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ val
β”‚   β”‚   β”‚   β”œβ”€β”€ ...

Please make sure you know that the Cityscapes-DVPS and SemKITTI-DVPS datasets are created by the authors of ViP-Deeplab.

Docker Container

After you prepared the datasets, you can create and enter a docker container:

DATALOC={/path/to/datafolder} LOGLOC={/path/to/logfolder} bash tools/docker.sh

The DATALOC will be linked to data in the project folder, and the LOGLOC will be linked to /opt/logger.

Getting Start

Let's go for πŸƒβ€β™€οΈrunning code.

Image training

bash tools/dist_train.sh configs/polyphonic_image/poly_r50_cityscapes_2x.py 8 --seed 0 --work-dir /opt/logger/exp001

Image testing

bash tools/dist_test.sh configs/polyphonic_image/poly_r50_cityscapes_2x.py https://huggingface.co/HarborYuan/PolyphonicFormer/resolve/main/polyphonic_r50_image.pth 8

Video training

bash tools/dist_train.sh configs/polyphonic_video/poly_r50_cityscapes_1x.py 8 --seed 0 --work-dir /opt/logger/vid001 --no-validate

Video testing

PYTHONPATH=. python tools/test_video.py configs/polyphonic_video/poly_r50_cityscapes_1x.py https://huggingface.co/HarborYuan/PolyphonicFormer/resolve/main/polyphonic_r50_video.pth --eval-video DVPQ --video-dir ./tmp

To test your own training results, just replace the online checkpoints to your local checkpoints. For example, you can run as the following for video testing:

PYTHONPATH=. python tools/test_video.py configs/polyphonic_video/poly_r50_cityscapes_1x.py /path/to/checkpoint.pth --eval-video DVPQ --video-dir ./tmp

Acknowledgements

The image segmentation model is based on K-Net. The datasets are extracted from ViP-Deeplab. Please refer them if you think they are useful.

@article{zhang2021k,
  title={K-Net: Towards Unified Image Segmentation},
  author={Zhang, Wenwei and Pang, Jiangmiao and Chen, Kai and Loy, Chen Change},
  journal={NeurIPS},
  year={2021}
}
@inproceedings{qiao2021vip,
  title={ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation},
  author={Qiao, Siyuan and Zhu, Yukun and Adam, Hartwig and Yuille, Alan and Chen, Liang-Chieh},
  booktitle={CVPR},
  year={2021}
}

Citation

If you think the code are useful in your research, please consider to refer PolyphonicFormer:

@inproceedings{yuan2022polyphonicformer,
  title={Polyphonicformer: Unified Query Learning for Depth-aware Video Panoptic Segmentation},
  author={Yuan, Haobo and Li, Xiangtai and Yang, Yibo and Cheng, Guangliang and Zhang, Jing and Tong, Yunhai and Zhang, Lefei and Tao, Dacheng},
  booktitle={ECCV},
  year={2022},
}