Awesome

LGRNet: Local-Global Reciprocal Network for Video Polyp Segmentation `Paper` | `BibTeX`

Huihui Xu, Yijun Yang(📈), Angelica Aviles-Rivero, Guang Yang, Jing Qin, and Lei Zhu

📈: the UFUV dataset in the original paper may be made open-accessed in future, please email them for permissions.

This is the official implmentation of LGRNet (MICCAI'24 Early Accept), which incorporates local Cyclic Neighborhoold Propagation and global Hilbert Selective Scan. Together with the notion of Frame Bottleneck Queries, LGRNet can both efficiently and effectively aggregate the local-global temporal context, which achieves state-of-the-art on the public Video Polyp Segmentation(VPS) benchmark.

<div align="justify">As an example for ultrasound video, a single frame is too noisy and insufficient for accurate lesion diagnosis. In practice, doctors need to check neighboring frames(local) and collect all visual clues (global) in the video to predict possible lesion region and filter out irrelevent surrounding issues. </div> </br> <div align="center" style="padding: 0 100pt"> <img src="assets/images/pipeline.png"> </div> </br> <div align="justify"> In CNP, each token takes the neighborhood tokens (defined by a kernel) in the cyclic frame as attention keys. CNP enables aggregating the local(cyclic) temporal information into one token. In Hilbert Selective Scan, a set of frame bottleneck queries are used to aggreate spatial information from each frame. Then, we use Hilbert Selective Scan to efficiently parse the global temporal context based on these bottleneck queries. The global temporal context is then propagated back to the feature maps by a Distribute layer. Based on Mask2Former, the decoder can output a set of different mask predictions with corresponding confidence score, which also facilitates comprehesive diagnosis.</div>

Items

Installation: Please refer to INSTALL.md for more details.
Data preparation: Please refer to DATA.md for more details.
Training:

Change PORT_NUM for DDP and make sure the $CURRENT_TASK is 'VIS':

export CURRENT_TASK=VIS
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=PORT_NUM

Make sure the $PT_PATH and $DATASET_PATH are correctly set during installation and preparing data.

The training on SUN-SEG is conducted using 2 4090-24GB GPUs:

CUDA_VISIBLE_DEVICES=0,1 TORCH_NUM_WORKERS=8 python main.py --config_file output/VIS/sunseg/pvt/pvt.py --trainer_mode train_attmpt

logs, checkpoints, predictions

Backbone	Dataset	Dice	mIou	log	ckpt	predictions
PVTv2-B2	SUN-SEG-Train	--	--	log	ckpt	--
PVTv2-B2	SUN-SEG-Hard-Testing	0.876	0.805	log	ckpt	mask predictions
PVTv2-B2	SUN-SEG-Easy-Testing	0.875	0.810	log	ckpt	mask predictions
PVTv2-B2	SUN-SEG-Hard-Unseen-Testing	0.865	0.792	log	ckpt	mask predictions
PVTv2-B2	SUN-SEG-Easy-Unseen-Testing	0.853	0.783	log	ckpt	mask predictions
Res2Net-50	SUN-SEG-Hard-Testing	0.841	0.765	log
Res2Net-50	SUN-SEG-Easy-Testing	0.843	0.774	log
PVTv2-B2	CVC612V	0.933	0.877	log
PVTv2-B2	CVC300TV	0.916	0.852	log
PVTv2-B2	CVC612T	0.875	0.814	log

Evaluate: Evaluating on SUN-SEG-Easy AND SUN-SEG-Hard using 1 4090-24GPU GPUS (modify the ckpt_path to the absolute path):

CUDA_VISIBLE_DEVICES=0 TORCH_NUM_WORKERS=8 python main.py --config_file output/VIS/sunseg/pvt/pvt.py --trainer_mode eval --eval_path ckpt_path

citing

@article{xu2024lgrnet,
  title={LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos},
  author={Xu, Huihui and Yang, Yijun and Aviles-Rivero, Angelica I and Yang, Guang and Qin, Jing and Zhu, Lei},
  journal={arXiv preprint arXiv:2407.05703},
  year={2024}
}

Acknowledgments