Home

Awesome

[2021][MICCAI][PNS-Net]Progressively Normalized Self-Attention Network for Video Polyp Segmentation

<img src="./imgs/VideoPresentation-min.gif" width="100%" />

Authors: Ge-Peng Ji*, Yu-Cheng Chou*, Deng-Ping Fan, Geng Chen, Huazhu Fu, Debesh Jha, & Ling Shao.

This repository provides code for paper "Progressively Normalized Self-Attention Network for Video Polyp Segmentation" published at the MICCAI-2021 conference (arXiv Version & Springer version). If you have any questions about our paper, feel free to contact me. And if you like our PNS-Net or evaluation toolbox for your personal research, please cite this paper (BibTeX).

Features

1.1. 🔥NEWS🔥 :

1.2. Table of Contents

<small><i><a href='http://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a></i></small>

2. Overview

2.1. Introduction

Existing video polyp segmentation (VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs can not fully exploit the global temporal and spatial information in successive video frames, resulting in false-positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (~140fps) on a single RTX 2080 GPU and no post-processing.

Our PNS-Net is based solely on a basic normalized self-attention block, dispensing with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task.

2.2. Framework Overview

<p align="center"> <img src="imgs/MainFramework.jpg"/> <br /> <em> Figure 1: Overview of the proposed PNS-Net, including the normalized self-attention block (see § 2.1) with a stacked (×R) learning strategy. See § 2 in the paper for details. </em> </p>

2.3. Qualitative Results

<p align="center"> <img src="imgs/Qualitive.png"/> <br /> <em> Figure 2: Qualitative Results. </em> </p>

3. Proposed Baseline

3.1. Training/Testing

The training and testing experiments are conducted using PyTorch with a single GeForce RTX 2080 GPU of 8 GB Memory.

  1. Configuring your environment (Prerequisites):

    Note that PNS-Net is only tested on Ubuntu OS with the following environments. It may work on other operating systems as well but we do not guarantee that it will.

    • Creating a virtual environment in terminal:

    conda create -n PNSNet python=3.6.

    • Installing necessary packages PyTorch 1.1:
    conda create -n PNSNet python=3.6
    conda activate PNSNet
    conda install pytorch=1.1.0 torchvision -c pytorch
    pip install tensorboardX tqdm Pillow==6.2.2
    pip install git+https://github.com/pytorch/tnt.git@master
    
    • Our core design is built on CUDA OP with torchlib. Please ensure the base CUDA toolkit version is 10.x (not at conda env), and then build the NS Block:
    cd ./lib/PNS
    python setup.py build develop
    
  2. Downloading necessary data:

  3. Testing Configuration:

    • After you download all the pre-trained model and testing dataset, just run MyTest_finetune.py to generate the final prediction map in ./res.
    • Just enjoy it!
    • The prediction results of all competitors and our PNS-Net can be found at Google Drive (7.7MB).
  4. Training Configuration:

    • With the training dataset downloaded, you can pre-train the model first then fine-tune with the pre-trained weights, just run MyTrain_Pretrain.py firstly and MyTrain_Finetune.py secondly.

    • Remember to configure the pretrain_state_dict in config.py for different training stages.

3.2 Evaluating your trained model:

One-key evaluation is written in MATLAB code (link), please follow the instructions in ./eval/main_VPS.m and just run it to generate the evaluation results in ./eval-Result/.

NOTE: The different strategies of the sequential model may generate a various number of predictions, such as optical flow based method only generates T-1 frames due to forward/backward frame-difference strategy. Thus, we test the T-2 frames by removing the first and end frames for a fair comparison.

4. Citation

Please cite our paper if you find the work useful:

@inproceedings{ji2021progressively,
  title={Progressively normalized self-attention network for video polyp segmentation},
  author={Ji, Ge-Peng and Chou, Yu-Cheng and Fan, Deng-Ping and Chen, Geng and Fu, Huazhu and Jha, Debesh and Shao, Ling},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={142--152},
  year={2021},
  organization={Springer}
}

@inproceedings{fan2020pranet,
  title={Pranet: Parallel reverse attention network for polyp segmentation},
  author={Fan, Deng-Ping and Ji, Ge-Peng and Zhou, Tao and Chen, Geng and Fu, Huazhu and Shen, Jianbing and Shao, Ling},
  booktitle={International conference on medical image computing and computer-assisted intervention},
  pages={263--273},
  year={2020},
  organization={Springer}
}

5. FAQ


6. Acknowledgements

This code is built on SINetV2 (PyTorch) and PyramidCSA (PyTorch). We thank the authors for sharing the codes.

⬆ back to top