Awesome
Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation
This is the official PyTorch implementation of our work: "Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation" accepted at WACV 2022.
In this paper we consider the task of semantic segmentation in autonomous driving applications. Specifically, we consider the cross-domain few-shot setting where training can use only few real-world annotated images and many annotated synthetic images. In this context, aligning the domains is made more challenging by the pixel-wise class imbalance that is intrinsic in the segmentation and that leads to ignoring the underrepresented classes and overfitting the well represented ones. We address this problem with a novel framework called Pixel-By-Pixel Cross-Domain Alignment (PixDA).
[Paper] [Supplementary] [ArXiV]
Requirements
To facilitate your work in installing all dependencies, we provide you the requirement (requirements.txt) file. This repository uses the following libraries:
- Python (3.7.10)
- Cuda (11.3)
- Pytorch (1.8.1)
- Torchvision (0.9.1)
- TensorboardX (2.2)
- Numpy (1.19.2)
- Scipy (1.6.2)
- Matplotlib (3.4.1)
- Pillow (8.2)
- ImageIO (2.9)
- Tqdm (4.60)
Data
In this project we use three datasets, GTA5, SYNTHIA and Cityscapes. Download it from their official websites:
Training
The most important file is run.py, that is in charge to start the training or test procedure. To run it, simpy use the following command:
python run.py --source_dataset <source_dataset> --source_dataroot <source_data_folder> --target_dataset <target_dataset> --target_dataroot <target_data_folder> --name <exp_name> --gpu_ids 0,1 --is_train --num_shots <1/2/3/4/5>
Leaving all the default parameters in the config.py file, you can replicate the experiments of the paper. Pretrained backbone and source only models for GTA5 and SYNTHIA can be found here (just unzip and copy the "pretrained_models" folder inside the models folder of the repository: link.
Training Commands
Train 1/2/3/4/5-shot for GTA to Cityscapes experiments:
python run.py --source_dataset gta5 --source_dataroot <source_data_folder> --target_dataset cityscapes --target_dataroot <target_data_folder> --name <exp_name> --gpu_ids 0,1 --is_train --num_shots <1/2/3/4/5>
Train 1/2/3/4/5-shot for SYNTHIA to Cityscapes experiments:
python run.py --source_dataset synthia --source_dataroot <source_data_folder> --target_dataset cityscapes --target_dataroot <target_data_folder> --name <exp_name> --gpu_ids 0,1 --is_train --num_shots <1/2/3/4/5>
Testing:
python run.py --source_dataset gta5 --source_dataroot <source_data_folder> --target_dataset cityscapes --target_dataroot <target_data_folder> --name <exp_name> --gpu_ids 0,1 --num_shots <1/2/3/4/5> --which_iter <model_iter_number_to_resume>
Cite us
If you use this repository, please consider to cite us:
@InProceedings{Tavera_2022_WACV,
author = {Tavera, Antonio and Cermelli, Fabio and Masone, Carlo and Caputo, Barbara},
title = {Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2022},
pages = {1626-1635}}