Awesome

DIaM for Generalized Few-Shot Semantic Segmentation

This repository contains the code for our CVPR 2023 paper, A Strong Baseline for Generalized Few-Shot Semantic Segmentation.

Abstract: This paper introduces a generalized few-shot segmentation framework with a straightforward training process and an easy-to-optimize inference phase. In particular, we propose a simple yet effective model based on the well-known InfoMax principle, where the Mutual Information (MI) between the learned feature representations and their corresponding predictions is maximized. In addition, the terms derived from our MI-based formulation are coupled with a knowledge distillation term to retain the knowledge on base classes. With a simple training process, our inference model can be applied on top of any segmentation network trained on base classes. The proposed inference yields substantial improvements on the popular few-shot segmentation benchmarks PASCAL-5i and COCO-20i. Particularly, for novel classes, the improvement gains range from 7% to 26% (PASCAL-5i) and from 3% to 12% (COCO-20i) in the 1-shot and 5-shot scenarios, respectively. Furthermore, we propose a more challenging setting, where performance gaps are further exacerbated.

🎬 Getting Started

:one: Requirements

We used Python 3.9 in our experiments and the list of packages is available in the requirements.txt file. You can install them using pip install -r requirements.txt.

:two: Download data

Pre-processed data from drive

We provide the versions of PASCAL VOC 2012 and MS-COCO 2017 used in this work here. You can download the full .zip and directly extract it in the data/ folder.

From scratch

Alternatively, you can prepare the datasets yourself. Here is the structure of the data folder for you to reproduce:

data
├── coco
│   ├── annotations
│   ├── train
│   ├── train2014
│   ├── val
│   └── val2014
└── pascal
|   ├── JPEGImages
|   └── SegmentationClassAug

PASCAL: The JPEG images can be found in the PASCAL-VOC 2012 toolkit to be downloaded at PASCAL VOC 2012 and SegmentationClassAug (pre-processed ground-truth masks).

COCO: COCO 2014 train images, validation images and annotations can be downloaded at COCO. Once this is done, you will have to generate the subfolders coco/train and coco/val (ground truth masks). Both folders can be generated by executing the python script data/coco/create_masks.py (note that this script uses the pycocotools package):

cd data/coco
python create_masks.py

About the train/val splits

The train/val splits are directly provided in lists/. How they were obtained is explained at https://github.com/Jia-Research-Lab/PFENet.

:three: Download pre-trained models

Pre-trained backbone and models

We provide the pre-trained backbone and models at https://drive.google.com/file/d/1WuKaJbj3Y3QMq4yw_Tyec-KyTchjSVUG/view?usp=share_link. You can download them and directly extract them at the root of this repo. This will create two folders: initmodel/ and model_ckpt/.

🗺 Overview of the repo

Default configuration files can be found in config/. Data are located in data/. lists/ contains the train/val splits for each dataset. All the codes are provided in src/. Testing script is located at the root of the repo.

⚙ Training (optional)

If you want to use the pre-trained models, this step is optional. Our contribution lies in the inference phase and our approach is modular, i.e., it can be applied on top of any segmentation model that is trained on the base classes. We use a simple training scheme by minimizing a standard cross-entropy over base classes. To this end, we have used the train_base.py script and base learner models of BAM (see this issue for more info).

🧪 Testing

To test the model, use the test.sh script, which its general syntax is:

bash test.sh {benchmark} {shot} {pi_estimation_strategy} {[gpu_ids]} {log_path}

This script tests successively on all folds of the benchmark and reports the results individually. The overall performance is the average over all the folds. Some example commands are presented below, with their description in the comments.

bash test.sh pascal5i 1 self [0] out.log  # PASCAL-5i benchmark, 1-shot, estimate pi by model's output
bash test.sh pascal10i 5 self [0] out.log  # PASCAL-10i benchmark, 5-shot, estimate pi by model's output
bash test.sh coco20i 5 upperbound [0] out.log  # COCO-20i benchmark, 5-shot, the upperbound model mentioned in the paper

If you run out of memory, reduce batch_size_val in the config files.

📊 Results

To reproduce the results, please first download the pre-trained models from here (also mentioned in the "download pre-trained models" section) and then run the test.sh script with different inputs, as explained above.

<table> <tr> <th colspan="2"></th> <th colspan="3">1-Shot</th> <th colspan="3">5-Shot</th> </tr> <tr> <th>Benchmark</th> <th>Fold</th> <th>Base</th> <th>Novel</th> <th>Mean</th> <th>Base</th> <th>Novel</th> <th>Mean</th> </tr> <tr> <td rowspan="5">PASCAL-5i</td> <td>0</td> <td>71.33</td> <td>29.36</td> <td>50.35</td> <td>71.06</td> <td>53.72</td> <td>62.39</td> </tr> <tr> <td>1</td> <td>69.54</td> <td>46.72</td> <td>58.13</td> <td>69.63</td> <td>63.33</td> <td>66.48</td> </tr> <tr> <td>2</td> <td>69.10</td> <td>27.07</td> <td>48.09</td> <td>69.12</td> <td>54.01</td> <td>61.57</td> </tr> <tr> <td>3</td> <td>73.60</td> <td>37.30</td> <td>55.45</td> <td>73.60</td> <td>50.19</td> <td>61.90</td> </tr> <tr> <td>mean</td> <td>70.89</td> <td>35.11</td> <td>53.00</td> <td>70.85</td> <td>55.31</td> <td>63.08</td> </tr> <tr> <td rowspan="5">COCO-20i</td> <td>0</td> <td>49.01</td> <td>15.89</td> <td>32.45</td> <td>48.90</td> <td>24.86</td> <td>36.88</td> </tr> <tr> <td>1</td> <td>46.83</td> <td>19.50</td> <td>33.17</td> <td>47.10</td> <td>33.94</td> <td>40.52</td> </tr> <tr> <td>2</td> <td>48.82</td> <td>16.93</td> <td>32.88</td> <td>49.12</td> <td>27.15</td> <td>38.14</td> </tr> <tr> <td>3</td> <td>48.45</td> <td>16.57</td> <td>32.51</td> <td>48.37</td> <td>28.95</td> <td>38.66</td> </tr> <tr> <td>mean</td> <td>48.28</td> <td>17.22</td> <td>32.75</td> <td>48.37</td> <td>28.73</td> <td>38.55</td> </tr> <tr> <td rowspan="5">PASCAL-10i</td> <td>0</td> <td>68.69</td> <td>34.40</td> <td>51.55</td> <td>68.49</td> <td>55.94</td> <td>62.22</td> </tr> <tr> <td>1</td> <td>71.83</td> <td>28.17</td> <td>50.00</td> <td>72.00</td> <td>47.84</td> <td>59.92</td> </tr> <tr> <td>mean</td> <td>70.26</td> <td>31.29</td> <td>50.77</td> <td>70.25</td> <td>51.89</td> <td>61.07</td> </tr> </table>

🙏 Acknowledgments

We gratefully thank the authors of RePRI, BAM, PFENet, and PyTorch Semantic Segmentation from which some parts of our code are inspired.

📚 Citation

If you find this project useful, please consider citing:

@inproceedings{hajimiri2023diam,
  title={A Strong Baseline for Generalized Few-Shot Semantic Segmentation},
  author={Hajimiri, Sina and Boudiaf, Malik and Ben Ayed, Ismail and Dolz, Jose},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11269--11278},
  year={2023}
}