Home

Awesome

ECCV2022 Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation

Implementation of "Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation", to appear at ECCV 2022

arxiv link : https://arxiv.org/abs/2203.13409

<!-- ![fig](misc/figs/fig1.PNG ) -->

fig

Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation,
Theodoros Pissas, Claudio S. Ravasio, Lyndon Da Cruz, Christos Bergeles <br>

arXiv technical report (arXiv 2203.13409)

ECCV 2022 (proceedings)

Log

Data and requirements

  1. Download datasets
  2. Modify paths as per your setup in configs/paths_info.json to add path to a folder and a log_path and a data_path (see example in paths_info.json) <br> a) data_path should be the root directory of the datasets <br> b) log_path is where you want each run to generate a directory containing logs/checkpoints to be stored
  3. Create conda environment with pytorch 1.7 and CUDA 10.0
    conda env create -f env_dgx.yml 
    conda activate semseg
    

Train

To train a model we specify most settings using json configuration files, found in configs. For each model on each dataset uses its own config. We also specify a few settings from commandline (see main.py) and also can override config settings from the commandline (see main.py) Here we show commands to start training on 4 GPUs and with the settings used in the paper.

Training with ResNet or HRNet backbones requires imagenet initialization which is handled by torchvision or downloaded from a url respectively. To train with Swin backbones we use the provided imagenet checkpoints from their official implementation https://github.com/microsoft/Swin-Transformer/. These must be downloaded in a directory called pytorch_checkpoints structured as follows:

```
pytorch_checkpoints/swin_imagenet/swin_tiny_patch4_window7_224.pth
                                 /swin_small_patch4_window7_224.pth
                                 /swin_base_patch4_window7_224.pth
                                 /swin_large_patch4_window7_224_22k.pth
```

Example commands to start training (d = cuda device ids, p = multigpu training bs = batch size, w = workers per gpu ):

Licensing and copyright

Please see the LICENSE file for details.

Acknowledgements

This project utilizes timm and the official implementation of swin Transformer. We thank the authors of those projects for open-sourcing their code and model weights.

Citation

If you found the paper or code useful please cite the following:

@misc{https://doi.org/10.48550/arxiv.2203.13409,
  doi = {10.48550/ARXIV.2203.13409},
  url = {https://arxiv.org/abs/2203.13409},
  author = {Pissas, Theodoros and Ravasio, Claudio S. and Da Cruz, Lyndon and Bergeles, Christos},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}