Home

Awesome

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

This repo contains the unofficial supported code and configuration files to reproduce LIIR. It is based on MAST.

Updates

Results and Models

configJ&Fmodelprecomputed results
APE66.9githubgithub
APE + compact prior69.0githubgithub
APE + inter-video reconstruction69.9githubgithub
APE + inter-video reconstruction + compact prior72.2githubgithub

Usage

Requirement

Pytorch == 1.8.0 & torchvision == 0.9.0 & Spatial-correlation-samplar== 0.3.0

We do find verions of Pytorch and Spatial-correlation-samplar affect the results, please stick to our recommend setting.

We also provide the conda environment we used to help the reproduction [gDrive].

Inference

# APE
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_669.pt

# APE + Spatial Compactness Prior
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_compact_690.pt --compact

# APE + Inter-video training
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_intervideo_699.pt --usemomen

# APE + Inter-video training + Spatial Compactness Prior
CUDA_VISIBLE_DEVICES=0 python evaluate_davis.py --resume checkpoints/APE_compact_intervideo_722.pt --usemomen --compact

Training

Step 1, first we need to run one of the following:

# Baseline + APE
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  --lr 1e-3

# Baseline with 1/8 resolution participated in the reconstruction
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  --semantic --lr 1e-3

# Baseline + APE + Compactness Prior
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  --semantic --compact --lr 1e-3

Step 2, and then:

# Baseline + APE + Spatial Compactness Prior + Inter-video Reconstruction
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  \
    --usemomen --compact --lr 1e-4 --epochs 5 --pretrain [Step 1 checkpoints]

Bag of tricks

We recommend decoupling the first training step by running:

# Baseline + APE
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  --lr 1e-3

# Baseline + APE + 1/8 resultion reconstruction + Compactness Prior
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  \
    --semantic --compact --lr 1e-4 --pretrain [Baseline + APE checkpoints]

Freeze BN can also help:

# Add Freeze BN
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12222  \
    --freeze_bn --usemomen --compact --lr 1e-4 --epochs 5 --pretrain [checkpoints]

Notes:

TODO

Citing HieraSeg

@article{li2022locality,
  title={Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning},
  author={Li, Liulei and Zhou, Tianfei and Wang, Wenguan and Lu, Yang and Li, Jianwu and Yang, Yi},
  journal={CVPR},
  year={2022}
}