Awesome

`Medical Vision-and-Language Tasks and Methodologies: A Survey`

:fire::fire: This is a collection of medical vision-language tasks and methodologies:fire::fire:

:fire::fire: For more details, please see A Survey of Medical Vision-and-Language Applications and Their Techniques:fire::fire:

Overview

<img src="framework.png"> Overview of Medical visual language models (MVLM).

Medical Report Generation
Medical Visual Question Answering
Medical Multi-modal Diagnosis and Prognosis
Medical Image Segmentation
Medical Image-Text Retrieval

Medical Report Generation

<details> <summary>List of Papers:</summary>

Yang, Shuxin and Wu, Xian and Ge, Shen and Zheng, Zhuozhao and Zhou, S Kevin and Xiao, Li. "Radiology report generation with a learned knowledge base and multi-modal alignment" Medical Image Analysis (2023). [paper] [code]
Szeskin, Adi and Rochman, Shalom and Weiss, Snir and Lederman, Richard and Sosna, Jacob and Joskowicz, Leo. "Liver lesion changes analysis in longitudinal CECT scans by simultaneous deep learning voxel classification with SimU-Net" Medical Image Analysis (2023). [paper]
Zhu, Qingqing and Mathai, Tejas Sudharshan and Mukherjee, Pritam and Peng, Yifan and Summers, Ronald M and Lu, Zhiyong. "Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports" MICCAI (2023). [paper] [code]
Dalla Serra, Francesco and Wang, Chaoyang and Deligianni, Fani and Dalton, Jeffrey and O’Neil, Alison Q. "Finding-aware anatomical tokens for chest X-ray automated reporting" MICCAI (2023). [paper]
KiUT: Huang, Zhongzhen and Zhang, Xiaofan and Zhang, Shaoting. "Kiut: Knowledge-injected u-transformer for radiology report generation" CVPR (2023). [paper]
DCL: Li, Mingjie and Lin, Bingqian and Chen, Zicong and Lin, Haokun and Liang, Xiaodan and Chang, Xiaojun. "Dynamic graph enhanced contrastive learning for chest x-ray report generation" CVPR (2023). [paper] [code]
RGRG: Tanida, Tim and Müller, Philip and Kaissis, Georgios and Rueckert, Daniel. "Interactive and explainable region-guided radiology report generation" CVPR (2023). [paper] [code]
METransformer: Wang, Zhanyu and Liu, Lingqiao and Wang, Lei and Zhou, Luping. "Metransformer: Radiology report generation by transformer with multiple learnable expert tokens" CVPR (2023). [paper]
ICT: Zhang, Junsan and Shen, Xiuxuan and Wan, Shaohua and Goudos, Sotirios K and Wu, Jie and Cheng, Ming and Zhang, Weishan. "A novel deep learning model for medical report generation by inter-intra information calibration" JBHI (2023). [paper]
Zheng, Ervine and Yu, Qi. "Evidential interactive learning for medical image captioning" ICML (2023). [paper]
PRIOR: Cheng, Pujin and Lin, Li and Lyu, Junyan and Huang, Yijin and Luo, Wenhan and Tang, Xiaoying. "Prior: Prototype representation joint learning from medical images and reports" ICCV (2023). [paper] [code]
MRM: Zhou, Hong-Yu and Lian, Chenyu and Wang, Liansheng and Yu, Yizhou. "Advancing radiograph representation learning with masked record modeling" ICLR (2023). [paper] [code]
MMTN: Cao, Yiming and Cui, Lizhen and Zhang, Lei and Yu, Fuqiang and Li, Zhen and Xu, Yonghui. "MMTN: multi-modal memory transformer network for image-report consistent medical report generation" AAAI (2023). [paper]
ATAG: Yan, Sixing and Cheung, William K and Chiu, Keith and Tong, Terence M and Cheung, Ka Chun and See, Simon. "Attributed abnormality graph embedding for clinically accurate x-ray report generation" TMI (2023). [paper]
Yang, Shuxin and Wu, Xian and Ge, Shen and Zhou, S Kevin and Xiao, Li. "Knowledge matters: Chest radiology report generation with general and specific knowledge" Medical Image Analysis (2022). [paper] [code]
VTI: Najdenkoska, Ivona and Zhen, Xiantong and Worring, Marcel and Shao, Ling. "Uncertainty-aware report generation for chest X-rays by variational topic inference" Medical Image Analysis (2022). [paper] [code]
TranSQ: Kong, Ming and Huang, Zhengxing and Kuang, Kun and Zhu, Qiang and Wu, Fei. "Transq: Transformer-based semantic query for medical report generation" MICCAI (2022). [paper] [code]
Sun, Jinghan and Wei, Dong and Wang, Liansheng and Zheng, Yefeng. "Lesion guided explainable few weak-shot medical report generation" MICCAI (2022). [paper] [code]
MCGN: Wang, Zhanyu and Tang, Mingkang and Wang, Lei and Li, Xiu and Zhou, Luping. "A medical semantic-assisted transformer for radiographic report generation" MICCAI (2022). [paper]
SGF: Li, Jun and Li, Shibo and Hu, Ying and Tao, Huiren. "A self-guided framework for radiology report generation" MICCAI (2022). [paper]
SGT: Lin, Chen and Zheng, Shuai and Liu, Zhizhe and Li, Youru and Zhu, Zhenfeng and Zhao, Yao. "Sgt: Scene graph-guided transformer for surgical report generation" MICCAI (2022). [paper] [code]
ITA: Wang, Lin and Ning, Munan and Lu, Donghuan and Wei, Dong and Zheng, Yefeng and Chen, Jie. "An inclusive task-aware framework for radiology report generation" MICCAI (2022). [paper] [code]
RepsNet: Tanwani, Ajay K and Barral, Joelle and Freedman, Daniel. "Repsnet: Combining vision with language for automated medical reports" MICCAI (2022). [paper]
CoPlan: Nishino, Toru and Miura, Yasuhide and Taniguchi, Tomoki and Ohkuma, Tomoko and Suzuki, Yuki and Kido, Shoji and Tomiyama, Noriyuki. "Factual accuracy is not enough: Planning consistent description order for radiology report generation" EMNLP (2022). [paper]
Delbrouck, Jean-Benoit and Chambon, Pierre and Bluethgen, Christian and Tsai, Emily and Almusa, Omar and Langlotz, Curtis P. "Improving the factual correctness of radiology report generation with semantic rewards" EMNLP (2022). [paper] [code]
CGT: Li, Mingjie and Cai, Wenjia and Verspoor, Karin and Pan, Shirui and Liang, Xiaodan and Chang, Xiaojun. "Cross-modal clinical graph transformer for ophthalmic report generation" CVPR (2022). [paper]
TransFuser: Huang, Jia-Hong and Wu, Ting-Wei and Yang, C-H Huck and Shi, Zenglin and Lin, I and Tegner, Jesper and Worring, Marcel and others. "Non-local attention improves description generation for retinal images" WACV (2022). [paper]
XPRONET: Wang, Jun and Bhalerao, Abhir and He, Yulan. "Cross-modal prototype driven network for radiology report generation" ECCV (2022). [paper] [code]
DCNet(EDC-Net): Singh, Dilbag and Kaur, Manjit and Alanazi, Jazem Mutared and AlZubi, Ahmad Ali and Lee, Heung-No. "Efficient evolving deep ensemble medical image captioning network" JBHI (2022). [paper] [code]
Yan, Bin and Pei, Mingtao and Zhao, Meng and Shan, Caifeng and Tian, Zhaoxing. "Prior guided transformer for accurate radiology reports generation" JBHI (2022). [paper]
NSL: Han, Zhongyi and Wei, Benzheng and Xi, Xiaoming and Chen, Bo and Yin, Yilong and Li, Shuo. "Unifying neural learning and symbolic reasoning for spinal medical report generation" Medical Image Analysis (2021). [paper]
AlignTransformer: You, Di and Liu, Fenglin and Ge, Shen and Xie, Xiaoxia and Zhang, Jing and Wu, Xian. "Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation" MICCAI (2021). [paper]
VTI: Najdenkoska, Ivona and Zhen, Xiantong and Worring, Marcel and Shao, Ling. "Variational topic inference for chest x-ray report generation" MICCAI (2021). [paper]
CNN-TRG: Pino, Pablo and Parra, Denis and Besa, Cecilia and Lagos, Claudio. "Clinically correct report generation from chest x-rays using templates" MICCAI (2021). [paper]
RATCHET: Hou, Benjamin and Kaissis, Georgios and Summers, Ronald M and Kainz, Bernhard. "Ratchet: Medical transformer for chest x-ray diagnosis and reporting" MICCAI (2021). [paper] [code]
CIDA: Xu, Mengya and Islam, Mobarakol and Lim, Chwee Ming and Ren, Hongliang. "Class-incremental domain adaptation with smoothing and calibration for surgical report generation" MICCAI (2021). [paper] [code]
: Nguyen, Hoang TN and Nie, Dong and Badamdorj, Taivanbat and Liu, Yujie and Zhu, Yingying and Truong, Jason and Cheng, Li. "Automated generation of accurate & fluent medical x-ray reports" EMNLP (2021). [paper] [code]
${M^2}$TR. PROGRESSIVE: Nooralahzadeh, Farhad and Gonzalez, Nicolas Perez and Frauenfelder, Thomas and Fujimoto, Koji and Krauthammer, Michael. "Progressive transformer-based generation of radiology reports" EMNLP (2021). [paper] [code]
CMCL: Liu, Fenglin and Ge, Shen and Zou, Yuexian and Wu, Xian. "Competence-based multimodal curriculum learning for medical report generation" ACL (2021). [paper]]
MedWriter: Yang, Xingyi and Ye, Muchao and You, Quanzeng and Ma, Fenglong. "Writing by memorizing: Hierarchical retrieval-based medical report generation" ACL (2021). [paper]
CA: Liu, Fenglin and Yin, Changchang and Wu, Xian and Ge, Shen and Zou, Yuexian and Zhang, Ping and Sun, Xu. "Contrastive attention for automatic chest x-ray report generation" ACL (2021). [paper]
CMN: Chen, Zhihong and Shen, Yaling and Song, Yan and Wan, Xiang. "Cross-modal memory networks for radiology report generation" ACL (2021). [paper] [code]
KGAE: Liu, Fenglin and You, Chenyu and Wu, Xian and Ge, Shen and Sun, Xu and others. "Auto-encoding knowledge graph for unsupervised medical report generation" NIPS (2021). [paper]
CXR-RePaiR: Endo, Mark and Krishnan, Rayan and Krishna, Viswesh and Ng, Andrew Y and Rajpurkar, Pranav. "Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model" NIPS (2021). [paper]
MEDSKIP: Pahwa, Esha and Mehta, Dwij and Kapadia, Sanjeet and Jain, Devansh and Luthra, Achleshwar. "Medskip: Medical report generation using skip connections and integrated attention" ICCV (2021). [paper]
Zhou, Yi and Huang, Lei and Zhou, Tao and Fu, Huazhu and Shao, Ling. "Visual-textual attentive semantic consistency for medical report generation" ICCV (2021). [paper]
PPKED: Liu, Fenglin and Wu, Xian and Ge, Shen and Fan, Wei and Zou, Yuexian. "Exploring and distilling posterior and prior knowledge for radiology report generation" CVPR (2021). [paper]
Wang, Zhanyu and Zhou, Luping and Wang, Lei and Li, Xiu. "A self-boosting framework for automated radiographic report generation" CVPR (2021). [paper]
Huang, Jia-Hong and Yang, C-H Huck and Liu, Fangyu and Tian, Meng and Liu, Yi-Chieh and Wu, Ting-Wei and Lin, I and Wang, Kang and Morikawa, Hiromasa and Chang, Hernghua and others. "Deepopht: medical report generation for retinal images via deep models and visual explanation" WACV (2021). [paper]
TriNet: Yang, Yan and Yu, Jun and Zhang, Jian and Han, Weidong and Jiang, Hanliang and Huang, Qingming. "Joint embedding of deep visual and semantic features for medical image report generation" TMM (2021). [paper] [code]
TS-MRGen: Nishino, Toru and Ozaki, Ryota and Momoki, Yohei and Taniguchi, Tomoki and Kano, Ryuji and Nakano, Norihisa and Tagawa, Yuki and Taniguchi, Motoki and Ohkuma, Tomoko and Nakamura, Keigo. "Reinforcement learning with imbalanced dataset for data-to-text medical report generation" EMNLP (2020). [paper] [code]
R2Gen: Chen, Zhihong and Song, Yan and Chang, Tsung-Hui and Wan, Xiang. "Generating radiology reports via memory-driven transformer" EMNLP (2020). [paper] [code]
Lovelace, Justin and Mortazavi, Bobak. "Learning to generate clinically coherent chest X-ray reports" EMNLP (2020). [paper]
CVSE: Ni, Jianmo and Hsu, Chun-Nan and Gentili, Amilcare and McAuley, Julian. "Learning visual-semantic embeddings for reporting abnormal findings on chest X-rays" EMNLP (2020). [paper]
Gasimova, Aydan and Seegoolam, Gavin and Chen, Liang and Bentley, Paul and Rueckert, Daniel. "Spatial semantic-preserving latent space learning for accelerated dwi diagnostic report generation" MICCAI (2020). [paper]
Syeda-Mahmood, Tanveer and Wong, Ken CL and Gur, Yaniv and Wu, Joy T and Jadhav, Ashutosh and Kashyap, Satyananda and Karargyris, Alexandros and Pillai, Anup and Sharma, Arjun and Syed, Ali Bin and others. "Chest x-ray report generation through fine-grained label learning" MICCAI (2020). [paper]
Zhang, Yixiao and Wang, Xiaosong and Xu, Ziyue and Yu, Qihang and Yuille, Alan and Xu, Daguang. "When radiology report generation meets knowledge graph" AAAI (2020). [paper]

</details>

Medical Visual Question Answering

<details> <summary>List of Papers:</summary>

MUMC: Li, Pengfei and Liu, Gang and He, Jinlong and Zhao, Zixu and Zhong, Shenjun. "Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering" MICCAI (2023). [paper] [code]
Van Sonsbeek, Tom and Derakhshani, Mohammad Mahdi and Najdenkoska, Ivona and Snoek, Cees GM and Worring, Marcel. "Open-ended medical visual question answering through prefix tuning of language models" MICCAI (2023). [paper]
Tascon-Morales, Sergio and Márquez-Neila, Pablo and Sznitman, Raphael. "Localized questions in medical visual question answering" MICCAI (2023). [paper]
CS-VQLA: Bai, Long and Islam, Mobarakol and Ren, Hongliang. "Revisiting distillation for continual learning on visual question localized-answering in robotic surgery" MICCAI (2023). [paper] [code]
CAT-ViL: Bai, Long and Islam, Mobarakol and Ren, Hongliang. "CAT-ViL: co-attention gated vision-language embedding for visual question localized-answering in robotic surgery" MICCAI (2023). [paper] [code]
hi-VQA: Pellegrini, Chantal and Keicher, Matthias and Özsoy, Ege and Navab, Nassir. "Rad-restruct: A novel vqa benchmark and method for structured radiology reporting" MICCAI (2023). [paper] [code]
DeBCF: Zhan, Chenlu and Peng, Peng and Zhang, Hanrong and Sun, Haiyue and Shang, Chunnan and Chen, Tao and Wang, Hongsen and Wang, Gaoang and Wang, Hongwei. "Debiasing Medical Visual Question Answering via Counterfactual Training" MICCAI (2023). [paper]
${MF^2-MVQA}$: Song, Shanshan and Li, Jiangyun and Wang, Jing and Cai, Yuanxiu and Dong, Wenkai. "$MF^2-MVQA$: A Multi-Stage Feature Fusion Method for Medical Visual Question Answering" ISBI (2023). [paper] [code]
M2I2: Li, Pengfei and Liu, Gang and Tan, Lin and Liao, Jinying and Zhong, Shenjun. "Self-supervised vision-language pretraining for medial visual question answering" ISBI (2023). [paper] [code]
Q2ATransformer: Liu, Yunyi and Wang, Zhanyu and Xu, Dong and Zhou, Luping. "Q2atransformer: Improving medical vqa via an answer querying decoder" IPMI (2023). [paper]
Tascon-Morales, Sergio and Márquez-Neila, Pablo and Sznitman, Raphael. "Consistency-preserving visual question answering in medical imaging" MICCAI (2022). [paper] [code]
RepsNet: Tanwani, Ajay K and Barral, Joelle and Freedman, Daniel. "Repsnet: Combining vision with language for automated medical reports" MICCAI (2022). [paper]
Cong, Fuze and Xu, Shibiao and Guo, Li and Tian, Yinbing. "Anomaly matters: An anomaly-oriented model for medical visual question answering" TMI (2022). [paper]
VQAMix: Gong, Haifan and Chen, Guanqi and Mao, Mingzhi and Li, Zhen and Li, Guanbin. "Vqamix: Conditional triplet mixup for medical visual question answering" TMI (2022). [paper] [code]
Liu, Bo and Zhan, Li-Ming and Xu, Li and Wu, Xiao-Ming. "Medical visual question answering via conditional reasoning and contrastive learning" TMI (2022). [paper] [code]
TraP-VQA: Naseem, Usman and Khushi, Matloob and Kim, Jinman. "Vision-language transformer for interpretable pathology visual question answering" JBHI (2022). [paper]
MMQ: Do, Tuong and Nguyen, Binh X and Tjiputra, Erman and Tran, Minh and Tran, Quang D and Nguyen, Anh. "Multiple meta-model quantifying for medical visual question answering" MICCAI (2021). [paper] [code]
CPRD: Liu, Bo and Zhan, Li-Ming and Wu, Xiao-Ming. "Contrastive pre-training and representation distillation for medical visual question answering based on radiology images" MICCAI (2021). [paper] [code]
MMBERT: Khare, Yash and Bagal, Viraj and Mathew, Minesh and Devi, Adithi and Priyakumar, U Deva and Jawahar, CV. "Mmbert: Multimodal bert pretraining for improved medical vqa" ISBI (2021). [paper] [code]
QC-MLB: Vu, Minh H and Löfstedt, Tommy and Nyholm, Tufve and Sznitman, Raphael. "A question-centric model for visual question answering in medical imaging" TMI (2020). [paper]
MEVF: Nguyen, Binh D and Do, Thanh-Toan and Nguyen, Binh X and Do, Tuong and Tjiputra, Erman and Tran, Quang D. "Overcoming data limitation in medical visual question answering" MICCAI (2019). [paper] [code]

</details>

Medical Multi-modal Diagnosis and Prognosis

<details> <summary>List of Papers:</summary>

Xplainer: Pellegrini, Chantal and Keicher, Matthias and Özsoy, Ege and Jiraskova, Petra and Braren, Rickmer and Navab, Nassir. "Xplainer: From x-ray observations to explainable zero-shot diagnosis" MICCAI (2023). [paper] [code]
Zhong, Yi and Xu, Mengqiu and Liang, Kongming and Chen, Kaixin and Wu, Ming. "Ariadne's Thread: Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray Images" MICCAI (2023). [paper] [code]
CLIP-Lung: Lei, Yiming and Li, Zilong and Shen, Yan and Zhang, Junping and Shan, Hongming. "CLIP-Lung: Textual knowledge-guided lung nodule malignancy prediction" MICCAI (2023). [paper]
GSDG: Chen, Shouyu and Guo, Xin and Zhu, Jianping and Wang, Yin. "GSDG: Exploring a Global Semantic-Guided Dual-Stream Graph Model for Automated Volume Differential Diagnosis and Prognosis" MICCAI (2023). [paper]
Ichinose, Akimichi and Hatsutani, Taro and Nakamura, Keigo and Kitamura, Yoshiro and Iizuka, Satoshi and Simo-Serra, Edgar and Kido, Shoji and Tomiyama, Noriyuki. "Visual grounding of whole radiology reports for 3d ct images" MICCAI (2023). [paper]
Liu, Jiaxiang and Hu, Tianxiang and Zhang, Yan and Gai, Xiaotang and Feng, Yang and Liu, Zuozhu. "A chatgpt aided explainable framework for zero-shot medical image diagnosis" arXiv (2023). [paper]
WSI-MTMI: Liu, Jianxin and Ge, Rongjun and Wan, Peng and Zhu, Qi and Zhang, Daoqiang and Shao, Wei. "Multi-task multi-instance learning for jointly diagnosis and prognosis of early-stage breast invasive carcinoma from whole-slide pathological images" IPMI (2023). [paper]
Song, Xuegang and Zhou, Feng and Frangi, Alejandro F and Cao, Jiuwen and Xiao, Xiaohua and Lei, Yi and Wang, Tianfu and Lei, Baiying. "Multicenter and multichannel pooling GCN for early AD diagnosis based on dual-modality fused brain network" TMI (2022). [paper] [code]
Mehta, Sachin and Lu, Ximing and Wu, Wenjun and Weaver, Donald and Hajishirzi, Hannaneh and Elmore, Joann G and Shapiro, Linda G. "End-to-end diagnosis of breast biopsy images with transformers" Medical image analysis (2022). [paper]
${M^2F}$: Lu, Zilin and Lu, Mengkang and Xia, Yong. "M2F: A Multi-modal and Multi-task Fusion Network for Glioma Diagnosis and Prognosis" MICCAI (2022). [paper]
BERTHop: Monajatipoor, Masoud and Rouhsedaghat, Mozhdeh and Li, Liunian Harold and Jay Kuo, C-C and Chien, Aichi and Chang, Kai-Wei. "Berthop: An effective vision-and-language model for chest x-ray disease diagnosis" MICCAI (2022). [paper] [code]
Kim, Daekyung and Nam, Chang-Mo and Park, Haesol and Jang, Mijung and Lee, Kyong Joon. "Weakly supervised branch network with template mask for classifying masses in 3D automated breast ultrasound" WACV (2022). [paper]
Wu, Yujiao and Wang, Yaxiong and Huang, Xiaoshui and Yang, Fan and Ling, Sai Ho and Su, Steven Weidong. "Multimodal Learning for Non-small Cell Lung Cancer Prognosis" arXiv (2022). [paper]
Tan, Kaiwen and Huang, Weixian and Liu, Xiaofeng and Hu, Jinlong and Dong, Shoubin. "A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction" Artificial Intelligence in Medicine (2022). [paper]
Chen, Yifei and Li, Dandan and Zhang, Xin and Jin, Jing and Shen, Yi. "Computer aided diagnosis of thyroid nodules based on the devised small-datasets multi-view ensemble learning" Medical Image Analysis (2021). [paper]
Gündel, Sebastian and Setio, Arnaud AA and Ghesu, Florin C and Grbic, Sasa and Georgescu, Bogdan and Maier, Andreas and Comaniciu, Dorin. "Robust classification from noisy labels: Integrating additional knowledge for chest radiography abnormality assessment" Medical Image Analysis (2021). [paper]
Qiu, Di and Lui, Lok Ming. "Modal Uncertainty Estimation for Medical Imaging Based Diagnosis" MICCAI (2021). [paper]
Bhalodia, Riddhish and Hatamizadeh, Ali and Tam, Leo and Xu, Ziyue and Wang, Xiaosong and Turkbey, Evrim and Xu, Daguang. "Improving pneumonia localization via cross-attention on medical images and reports" MICCAI (2021). [paper]
Sekuboyina, Anjany and Oñoro-Rubio, Daniel and Kleesiek, Jens and Malone, Brandon. "A relational-learning perspective to multi-label chest X-ray classification" ISBI (2021). [paper]
Wu, Joy and Gur, Yaniv and Karargyris, Alexandros and Syed, Ali Bin and Boyko, Orest and Moradi, Mehdi and Syeda-Mahmood, Tanveer. "Automatic bounding box annotation of chest x-ray data for localization of abnormalities" ISBI (2020). [paper]
Chauhan, Geeticka and Liao, Ruizhi and Wells, William and Andreas, Jacob and Wang, Xin and Berkowitz, Seth and Horng, Steven and Szolovits, Peter and Golland, Polina. "Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment" MICCAI (2020). [paper] [code]
van Sonsbeek, Tom and Worring, Marcel. "Towards automated diagnosis with attentive multi-modal learning using electronic health records and chest x-rays" MICCAI (2020). [paper]

</details>

Medical Image Segmentation

<details> <summary>List of Papers:</summary>

LViT: Li, Zihan and Li, Yunxiang and Li, Qingde and Wang, Puyang and Guo, Dazhou and Lu, Le and Jin, Dakai and Zhang, You and Hong, Qingqi. "Lvit: language meets vision transformer in medical image segmentation" TMI (2024). [paper] [code]
SaLIP: Aleem, Sidra and Wang, Fangyijie and Maniparambil, Mayug and Arazo, Eric and Dietlmeier, Julia and Curran, Kathleen and Connor, Noel EO' and Little, Suzanne. "Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation" CVPR (2024). [paper] [code]
SegICL: Shen, Lingdong and Shang, Fangxin and Yang, Yehui and Huang, Xiaoshuang and Xiang, Shining. "SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging" arXiv (2024). [paper]
MedCLIP-SAM: Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming. "MedCLIP-SAM: Bridging text and image towards universal medical image segmentation" arXiv (2024). [paper] [code]
Kunhimon, Shahina and Naseer, Muzammal and Khan, Salman and Khan, Fahad Shahbaz. "Language Guided Domain Generalized Medical Image Segmentation" arXiv (2024). [paper] [code]
RecLMIS: Huang, Xiaoshuang and Li, Hongxiang and Cao, Meng and Chen, Long and You, Chenyu and An, Dong. "Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation" arXiv (2024). [paper] [code]
${CPAM^{TG}}$: Lee, Go-Eun and Kim, Seon Ho and Cho, Jungchan and Choi, Sang Tae and Choi, Sang-Il. "Text-guided cross-position attention for segmentation: Case of medical image" MICCAI (2023). [paper] [code]
TPRO: Zhang, Shaoteng and Zhang, Jianpeng and Xie, Yutong and Xia, Yong. "TPRO: Text-Prompting-Based weakly supervised histopathology tissue segmentation" MICCAI (2023). [paper] [code]
Liu, Jie and Zhang, Yixiao and Chen, Jie-Neng and Xiao, Junfei and Lu, Yongyi and A Landman, Bennett and Yuan, Yixuan and Yuille, Alan and Tang, Yucheng and Zhou, Zongwei. "Clip-driven universal model for organ segmentation and tumor detection" ICCV (2023). [paper] [code]
Han, Xianjun and Chen, Qianqian and Xie, Zhaoyang and Li, Xuejun and Yang, Hongyu. "Multiscale progressive text prompt network for medical image segmentation" Computers & Graphics (2023). [paper]
Lu, Yixing and Fan, Zhaoxin and Xu, Min. "Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation" International Conference on Multimedia Modeling (2024). [paper]
EMIT-Diff: Zhang, Zheyuan and Yao, Lanhong and Wang, Bin and Jha, Debesh and Keles, Elif and Medetalibeyoglu, Alpay and Bagci, Ulas. "Emit-diff: Enhancing medical image segmentation via text-guided diffusion model" arXiv (2023). [paper]
GTGM: Chen, Yinda and Liu, Che and Huang, Wei and Cheng, Sibo and Arcucci, Rossella and Xiong, Zhiwei. "Generative text-guided 3d vision-language pretraining for unified medical image segmentation" arXiv (2023). [paper]
Bi-VLGM: Wenting, Chen and Jie, Liu and Yixuan, Yuan. "Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation" arXiv (2023). [paper]
Segre, Leo and Hirschorn, Or and Ginzburg, Dvir and Raviv, Dan. "Shape-consistent generative adversarial networks for multi-modal medical segmentation maps" ISBI (2022). [paper] [code]
DTAN: Zhao, Yiyang and Li, Jinjiang and Ren, Lu and Chen, Zheng. "DTAN: Diffusion-based Text Attention Network for medical image segmentation" Computers in Biology and Medicine (2024). [paper]
TGEDiff: Dong, Zhiwei and Yuan, Genji and Hua, Zhen and Li, Jinjiang. "Diffusion model-based text-guided enhancement network for medical image segmentation" Expert Systems with Applications (2024). [paper]

</details>

Medical Image-Text Retrieval

<details> <summary>List of Papers:</summary>

"Text-guided visual representation learning for medical image retrieval systems" ICPR (2022) [paper]
SECMR: "Semantic Extension for Cross-Modal Retrieval of Medical Image-Diagnosis Report" NLPCC (2023) [paper]
DMACH: "Deep medical cross-modal attention hashing" [paper]
"Retrieving chest X-rays for differential diagnosis: A deep metric learning approach" IEEE EMBS (2021) [paper]
X-TRA: "X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation" IPMI (2023) [paper]
"Category supervised cross-modal hashing retrieval for chest X-ray and radiology reports" Computers & Electrical Engineering (2022) [paper]
"Multi-Modal Medical Image Matching Based on Multi-Task Learning and Semantic-Enhanced Cross-Modal Retrieval" Traitement du signal (2023) [paper]
MMDL: "Multimodal multitask deep learning for X-ray image retrieval" MCCAI (2021) [paper]
"Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report" ML4H (2023) [paper]
BIMCV-R: "BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval" ArXiv (2024) [paper]
3D-MIR: "3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology" ArXiv (2023) [paper] [code]

</details>

Awesome

<p align=center>`Medical Vision-and-Language Tasks and Methodologies: A Survey`</p>

Overview

Table of Contents

Medical Report Generation

Medical Visual Question Answering

Medical Multi-modal Diagnosis and Prognosis

Medical Image Segmentation

Medical Image-Text Retrieval

Awesome

<p align=center>Medical Vision-and-Language Tasks and Methodologies: A Survey</p>

Overview

Table of Contents

Medical Report Generation

Medical Visual Question Answering

Medical Multi-modal Diagnosis and Prognosis

Medical Image Segmentation

Medical Image-Text Retrieval

<p align=center>`Medical Vision-and-Language Tasks and Methodologies: A Survey`</p>