Awesome
LGRNet: Local-Global Reciprocal Network for Video Polyp Segmentation Paper
| BibTeX
Huihui Xu, Yijun Yang(📈), Angelica Aviles-Rivero, Guang Yang, Jing Qin, and Lei Zhu
📈: the UFUV dataset in the original paper may be made open-accessed in future, please email them for permissions.
This is the official implmentation of LGRNet (MICCAI'24 Early Accept), which incorporates local Cyclic Neighborhoold Propagation and global Hilbert Selective Scan. Together with the notion of Frame Bottleneck Queries, LGRNet can both efficiently and effectively aggregate the local-global temporal context, which achieves state-of-the-art on the public Video Polyp Segmentation(VPS) benchmark.
<div align="justify">As an example for ultrasound video, a single frame is too noisy and insufficient for accurate lesion diagnosis. In practice, doctors need to check neighboring frames(local) and collect all visual clues (global) in the video to predict possible lesion region and filter out irrelevent surrounding issues. </div> </br> <div align="center" style="padding: 0 100pt"> <img src="assets/images/pipeline.png"> </div> </br> <div align="justify"> In CNP, each token takes the neighborhood tokens (defined by a kernel) in the cyclic frame as attention keys. CNP enables aggregating the local(cyclic) temporal information into one token. In Hilbert Selective Scan, a set of frame bottleneck queries are used to aggreate spatial information from each frame. Then, we use Hilbert Selective Scan to efficiently parse the global temporal context based on these bottleneck queries. The global temporal context is then propagated back to the feature maps by a Distribute layer. Based on Mask2Former, the decoder can output a set of different mask predictions with corresponding confidence score, which also facilitates comprehesive diagnosis.</div>Items
-
Installation: Please refer to INSTALL.md for more details.
-
Data preparation: Please refer to DATA.md for more details.
-
Training:
Change PORT_NUM for DDP and make sure the $CURRENT_TASK is 'VIS':
export CURRENT_TASK=VIS
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=PORT_NUM
Make sure the $PT_PATH and $DATASET_PATH are correctly set during installation and preparing data.
The training on SUN-SEG is conducted using 2 4090-24GB GPUs:
CUDA_VISIBLE_DEVICES=0,1 TORCH_NUM_WORKERS=8 python main.py --config_file output/VIS/sunseg/pvt/pvt.py --trainer_mode train_attmpt
- logs, checkpoints, predictions
Backbone | Dataset | Dice | mIou | log | ckpt | predictions |
---|---|---|---|---|---|---|
PVTv2-B2 | SUN-SEG-Train | -- | -- | log | ckpt | -- |
PVTv2-B2 | SUN-SEG-Hard-Testing | 0.876 | 0.805 | log | ckpt | mask predictions |
PVTv2-B2 | SUN-SEG-Easy-Testing | 0.875 | 0.810 | log | ckpt | mask predictions |
PVTv2-B2 | SUN-SEG-Hard-Unseen-Testing | 0.865 | 0.792 | log | ckpt | mask predictions |
PVTv2-B2 | SUN-SEG-Easy-Unseen-Testing | 0.853 | 0.783 | log | ckpt | mask predictions |
Res2Net-50 | SUN-SEG-Hard-Testing | 0.841 | 0.765 | log | ||
Res2Net-50 | SUN-SEG-Easy-Testing | 0.843 | 0.774 | log | ||
PVTv2-B2 | CVC612V | 0.933 | 0.877 | log | ||
PVTv2-B2 | CVC300TV | 0.916 | 0.852 | log | ||
PVTv2-B2 | CVC612T | 0.875 | 0.814 | log |
- Evaluate: Evaluating on SUN-SEG-Easy AND SUN-SEG-Hard using 1 4090-24GPU GPUS (modify the ckpt_path to the absolute path):
CUDA_VISIBLE_DEVICES=0 TORCH_NUM_WORKERS=8 python main.py --config_file output/VIS/sunseg/pvt/pvt.py --trainer_mode eval --eval_path ckpt_path
citing
@article{xu2024lgrnet,
title={LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos},
author={Xu, Huihui and Yang, Yijun and Aviles-Rivero, Angelica I and Yang, Guang and Qin, Jing and Zhu, Lei},
journal={arXiv preprint arXiv:2407.05703},
year={2024}
}
Acknowledgments
- Thanks Gilbert for the implementation of Hilbert curve generation.
- Thanks GPT4 for helping me constructing idea of Hilbert Filling Curve v.s. Zigzag curve