Awesome
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
This repo contains the unofficial supported code and configuration files to reproduce LIIR. It is based on MAST.
Updates
- [2022-06-21] Initial commits
Results and Models
config | J&F | model | precomputed results |
---|---|---|---|
APE | 66.9 | github | github |
APE + compact prior | 69.0 | github | github |
APE + inter-video reconstruction | 69.9 | github | github |
APE + inter-video reconstruction + compact prior | 72.2 | github | github |
Usage
Requirement
Pytorch == 1.8.0 & torchvision == 0.9.0 & Spatial-correlation-samplar== 0.3.0
We do find verions of Pytorch and Spatial-correlation-samplar affect the results, please stick to our recommend setting.
We also provide the conda environment we used to help the reproduction [gDrive].
Inference
# APE
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_669.pt
# APE + Spatial Compactness Prior
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_compact_690.pt --compact
# APE + Inter-video training
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_intervideo_699.pt --usemomen
# APE + Inter-video training + Spatial Compactness Prior
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_compact_intervideo_722.pt --usemomen --compact
Training
Step 1, first we need to run one of the following:
# Baseline + APE
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 --lr 1e-3
# Baseline with 1/8 resolution participated in the reconstruction
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 --semantic --lr 1e-3
# Baseline + APE + Compactness Prior
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 --semantic --compact --lr 1e-3
Step 2, and then:
# Baseline + APE + Spatial Compactness Prior + Inter-video Reconstruction
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 \
--usemomen --compact --lr 1e-4 --epochs 5 --pretrain [Step 1 checkpoints]
Bag of tricks
We recommend decoupling the first training step by running:
# Baseline + APE
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 --lr 1e-3
# Baseline + APE + 1/8 resultion reconstruction + Compactness Prior
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 \
--semantic --compact --lr 1e-4 --pretrain [Baseline + APE checkpoints]
Freeze BN can also help:
# Add Freeze BN
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222 \
--freeze_bn --usemomen --compact --lr 1e-4 --epochs 5 --pretrain [checkpoints]
Notes:
- We use two Tesla A100 GPUs for training. CUDA version: 11.1.
TODO
- Code release
- Checkpoint release
- Evaluation code for YT-VOS, VIP and JHMDB
- PE Shuffle (we encounter some environment issues)
Citing HieraSeg
@article{li2022locality,
title={Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning},
author={Li, Liulei and Zhou, Tianfei and Wang, Wenguan and Lu, Yang and Li, Jianwu and Yang, Yi},
journal={CVPR},
year={2022}
}