Awesome
<p align=center>Medical Vision-and-Language Tasks and Methodologies: A Survey
</p>
:fire::fire: This is a collection of medical vision-language tasks and methodologies:fire::fire:
Overview
<p align="center"> <img src="framework.png"> <br> <em>Overview of Medical visual language models (MVLM).</em> </p>Table of Contents
Medical Report Generation
<details> <summary><b>List of Papers:</b></summary>-
Yang, Shuxin and Wu, Xian and Ge, Shen and Zheng, Zhuozhao and Zhou, S Kevin and Xiao, Li.<br> "Radiology report generation with a learned knowledge base and multi-modal alignment" Medical Image Analysis (2023). [paper] [code]
-
Szeskin, Adi and Rochman, Shalom and Weiss, Snir and Lederman, Richard and Sosna, Jacob and Joskowicz, Leo.<br> "Liver lesion changes analysis in longitudinal CECT scans by simultaneous deep learning voxel classification with SimU-Net" Medical Image Analysis (2023). [paper]
-
Zhu, Qingqing and Mathai, Tejas Sudharshan and Mukherjee, Pritam and Peng, Yifan and Summers, Ronald M and Lu, Zhiyong.<br> "Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports" MICCAI (2023). [paper] [code]
-
Dalla Serra, Francesco and Wang, Chaoyang and Deligianni, Fani and Dalton, Jeffrey and O’Neil, Alison Q.<br> "Finding-aware anatomical tokens for chest X-ray automated reporting" MICCAI (2023). [paper]
-
KiUT: Huang, Zhongzhen and Zhang, Xiaofan and Zhang, Shaoting.<br> "Kiut: Knowledge-injected u-transformer for radiology report generation" CVPR (2023). [paper]
-
DCL: Li, Mingjie and Lin, Bingqian and Chen, Zicong and Lin, Haokun and Liang, Xiaodan and Chang, Xiaojun.<br> "Dynamic graph enhanced contrastive learning for chest x-ray report generation" CVPR (2023). [paper] [code]
-
RGRG: Tanida, Tim and Müller, Philip and Kaissis, Georgios and Rueckert, Daniel.<br> "Interactive and explainable region-guided radiology report generation" CVPR (2023). [paper] [code]
-
METransformer: Wang, Zhanyu and Liu, Lingqiao and Wang, Lei and Zhou, Luping.<br> "Metransformer: Radiology report generation by transformer with multiple learnable expert tokens" CVPR (2023). [paper]
-
ICT: Zhang, Junsan and Shen, Xiuxuan and Wan, Shaohua and Goudos, Sotirios K and Wu, Jie and Cheng, Ming and Zhang, Weishan.<br> "A novel deep learning model for medical report generation by inter-intra information calibration" JBHI (2023). [paper]
-
Zheng, Ervine and Yu, Qi.<br> "Evidential interactive learning for medical image captioning" ICML (2023). [paper]
-
PRIOR: Cheng, Pujin and Lin, Li and Lyu, Junyan and Huang, Yijin and Luo, Wenhan and Tang, Xiaoying.<br> "Prior: Prototype representation joint learning from medical images and reports" ICCV (2023). [paper] [code]
-
MRM: Zhou, Hong-Yu and Lian, Chenyu and Wang, Liansheng and Yu, Yizhou.<br> "Advancing radiograph representation learning with masked record modeling" ICLR (2023). [paper] [code]
-
MMTN: Cao, Yiming and Cui, Lizhen and Zhang, Lei and Yu, Fuqiang and Li, Zhen and Xu, Yonghui.<br> "MMTN: multi-modal memory transformer network for image-report consistent medical report generation" AAAI (2023). [paper]
-
ATAG: Yan, Sixing and Cheung, William K and Chiu, Keith and Tong, Terence M and Cheung, Ka Chun and See, Simon.<br> "Attributed abnormality graph embedding for clinically accurate x-ray report generation" TMI (2023). [paper]
-
Yang, Shuxin and Wu, Xian and Ge, Shen and Zhou, S Kevin and Xiao, Li.<br> "Knowledge matters: Chest radiology report generation with general and specific knowledge" Medical Image Analysis (2022). [paper] [code]
-
VTI: Najdenkoska, Ivona and Zhen, Xiantong and Worring, Marcel and Shao, Ling.<br> "Uncertainty-aware report generation for chest X-rays by variational topic inference" Medical Image Analysis (2022). [paper] [code]
-
TranSQ: Kong, Ming and Huang, Zhengxing and Kuang, Kun and Zhu, Qiang and Wu, Fei.<br> "Transq: Transformer-based semantic query for medical report generation" MICCAI (2022). [paper] [code]
-
Sun, Jinghan and Wei, Dong and Wang, Liansheng and Zheng, Yefeng.<br> "Lesion guided explainable few weak-shot medical report generation" MICCAI (2022). [paper] [code]
-
MCGN: Wang, Zhanyu and Tang, Mingkang and Wang, Lei and Li, Xiu and Zhou, Luping.<br> "A medical semantic-assisted transformer for radiographic report generation" MICCAI (2022). [paper]
-
SGF: Li, Jun and Li, Shibo and Hu, Ying and Tao, Huiren.<br> "A self-guided framework for radiology report generation" MICCAI (2022). [paper]
-
SGT: Lin, Chen and Zheng, Shuai and Liu, Zhizhe and Li, Youru and Zhu, Zhenfeng and Zhao, Yao.<br> "Sgt: Scene graph-guided transformer for surgical report generation" MICCAI (2022). [paper] [code]
-
ITA: Wang, Lin and Ning, Munan and Lu, Donghuan and Wei, Dong and Zheng, Yefeng and Chen, Jie.<br> "An inclusive task-aware framework for radiology report generation" MICCAI (2022). [paper] [code]
-
RepsNet: Tanwani, Ajay K and Barral, Joelle and Freedman, Daniel.<br> "Repsnet: Combining vision with language for automated medical reports" MICCAI (2022). [paper]
-
CoPlan: Nishino, Toru and Miura, Yasuhide and Taniguchi, Tomoki and Ohkuma, Tomoko and Suzuki, Yuki and Kido, Shoji and Tomiyama, Noriyuki.<br> "Factual accuracy is not enough: Planning consistent description order for radiology report generation" EMNLP (2022). [paper]
-
Delbrouck, Jean-Benoit and Chambon, Pierre and Bluethgen, Christian and Tsai, Emily and Almusa, Omar and Langlotz, Curtis P.<br> "Improving the factual correctness of radiology report generation with semantic rewards" EMNLP (2022). [paper] [code]
-
CGT: Li, Mingjie and Cai, Wenjia and Verspoor, Karin and Pan, Shirui and Liang, Xiaodan and Chang, Xiaojun.<br> "Cross-modal clinical graph transformer for ophthalmic report generation" CVPR (2022). [paper]
-
TransFuser: Huang, Jia-Hong and Wu, Ting-Wei and Yang, C-H Huck and Shi, Zenglin and Lin, I and Tegner, Jesper and Worring, Marcel and others.<br> "Non-local attention improves description generation for retinal images" WACV (2022). [paper]
-
XPRONET: Wang, Jun and Bhalerao, Abhir and He, Yulan.<br> "Cross-modal prototype driven network for radiology report generation" ECCV (2022). [paper] [code]
-
DCNet(EDC-Net): Singh, Dilbag and Kaur, Manjit and Alanazi, Jazem Mutared and AlZubi, Ahmad Ali and Lee, Heung-No.<br> "Efficient evolving deep ensemble medical image captioning network" JBHI (2022). [paper] [code]
-
Yan, Bin and Pei, Mingtao and Zhao, Meng and Shan, Caifeng and Tian, Zhaoxing.<br> "Prior guided transformer for accurate radiology reports generation" JBHI (2022). [paper]
-
NSL: Han, Zhongyi and Wei, Benzheng and Xi, Xiaoming and Chen, Bo and Yin, Yilong and Li, Shuo.<br> "Unifying neural learning and symbolic reasoning for spinal medical report generation" Medical Image Analysis (2021). [paper]
-
AlignTransformer: You, Di and Liu, Fenglin and Ge, Shen and Xie, Xiaoxia and Zhang, Jing and Wu, Xian.<br> "Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation" MICCAI (2021). [paper]
-
VTI: Najdenkoska, Ivona and Zhen, Xiantong and Worring, Marcel and Shao, Ling.<br> "Variational topic inference for chest x-ray report generation" MICCAI (2021). [paper]
-
CNN-TRG: Pino, Pablo and Parra, Denis and Besa, Cecilia and Lagos, Claudio.<br> "Clinically correct report generation from chest x-rays using templates" MICCAI (2021). [paper]
-
RATCHET: Hou, Benjamin and Kaissis, Georgios and Summers, Ronald M and Kainz, Bernhard.<br> "Ratchet: Medical transformer for chest x-ray diagnosis and reporting" MICCAI (2021). [paper] [code]
-
CIDA: Xu, Mengya and Islam, Mobarakol and Lim, Chwee Ming and Ren, Hongliang.<br> "Class-incremental domain adaptation with smoothing and calibration for surgical report generation" MICCAI (2021). [paper] [code]
-
: Nguyen, Hoang TN and Nie, Dong and Badamdorj, Taivanbat and Liu, Yujie and Zhu, Yingying and Truong, Jason and Cheng, Li.<br> "Automated generation of accurate & fluent medical x-ray reports" EMNLP (2021). [paper] [code]
-
${M^2}$TR. PROGRESSIVE: Nooralahzadeh, Farhad and Gonzalez, Nicolas Perez and Frauenfelder, Thomas and Fujimoto, Koji and Krauthammer, Michael.<br> "Progressive transformer-based generation of radiology reports" EMNLP (2021). [paper] [code]
-
CMCL: Liu, Fenglin and Ge, Shen and Zou, Yuexian and Wu, Xian.<br> "Competence-based multimodal curriculum learning for medical report generation" ACL (2021). [paper]]
-
MedWriter: Yang, Xingyi and Ye, Muchao and You, Quanzeng and Ma, Fenglong.<br> "Writing by memorizing: Hierarchical retrieval-based medical report generation" ACL (2021). [paper]
-
CA: Liu, Fenglin and Yin, Changchang and Wu, Xian and Ge, Shen and Zou, Yuexian and Zhang, Ping and Sun, Xu.<br> "Contrastive attention for automatic chest x-ray report generation" ACL (2021). [paper]
-
CMN: Chen, Zhihong and Shen, Yaling and Song, Yan and Wan, Xiang.<br> "Cross-modal memory networks for radiology report generation" ACL (2021). [paper] [code]
-
KGAE: Liu, Fenglin and You, Chenyu and Wu, Xian and Ge, Shen and Sun, Xu and others.<br> "Auto-encoding knowledge graph for unsupervised medical report generation" NIPS (2021). [paper]
-
CXR-RePaiR: Endo, Mark and Krishnan, Rayan and Krishna, Viswesh and Ng, Andrew Y and Rajpurkar, Pranav.<br> "Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model" NIPS (2021). [paper]
-
MEDSKIP: Pahwa, Esha and Mehta, Dwij and Kapadia, Sanjeet and Jain, Devansh and Luthra, Achleshwar.<br> "Medskip: Medical report generation using skip connections and integrated attention" ICCV (2021). [paper]
-
Zhou, Yi and Huang, Lei and Zhou, Tao and Fu, Huazhu and Shao, Ling.<br> "Visual-textual attentive semantic consistency for medical report generation" ICCV (2021). [paper]
-
PPKED: Liu, Fenglin and Wu, Xian and Ge, Shen and Fan, Wei and Zou, Yuexian.<br> "Exploring and distilling posterior and prior knowledge for radiology report generation" CVPR (2021). [paper]
-
Wang, Zhanyu and Zhou, Luping and Wang, Lei and Li, Xiu.<br> "A self-boosting framework for automated radiographic report generation" CVPR (2021). [paper]
-
Huang, Jia-Hong and Yang, C-H Huck and Liu, Fangyu and Tian, Meng and Liu, Yi-Chieh and Wu, Ting-Wei and Lin, I and Wang, Kang and Morikawa, Hiromasa and Chang, Hernghua and others.<br> "Deepopht: medical report generation for retinal images via deep models and visual explanation" WACV (2021). [paper]
-
TriNet: Yang, Yan and Yu, Jun and Zhang, Jian and Han, Weidong and Jiang, Hanliang and Huang, Qingming.<br> "Joint embedding of deep visual and semantic features for medical image report generation" TMM (2021). [paper] [code]
-
TS-MRGen: Nishino, Toru and Ozaki, Ryota and Momoki, Yohei and Taniguchi, Tomoki and Kano, Ryuji and Nakano, Norihisa and Tagawa, Yuki and Taniguchi, Motoki and Ohkuma, Tomoko and Nakamura, Keigo.<br> "Reinforcement learning with imbalanced dataset for data-to-text medical report generation" EMNLP (2020). [paper] [code]
-
R2Gen: Chen, Zhihong and Song, Yan and Chang, Tsung-Hui and Wan, Xiang.<br> "Generating radiology reports via memory-driven transformer" EMNLP (2020). [paper] [code]
-
Lovelace, Justin and Mortazavi, Bobak.<br> "Learning to generate clinically coherent chest X-ray reports" EMNLP (2020). [paper]
-
CVSE: Ni, Jianmo and Hsu, Chun-Nan and Gentili, Amilcare and McAuley, Julian.<br> "Learning visual-semantic embeddings for reporting abnormal findings on chest X-rays" EMNLP (2020). [paper]
-
Gasimova, Aydan and Seegoolam, Gavin and Chen, Liang and Bentley, Paul and Rueckert, Daniel.<br> "Spatial semantic-preserving latent space learning for accelerated dwi diagnostic report generation" MICCAI (2020). [paper]
-
Syeda-Mahmood, Tanveer and Wong, Ken CL and Gur, Yaniv and Wu, Joy T and Jadhav, Ashutosh and Kashyap, Satyananda and Karargyris, Alexandros and Pillai, Anup and Sharma, Arjun and Syed, Ali Bin and others.<br> "Chest x-ray report generation through fine-grained label learning" MICCAI (2020). [paper]
-
Zhang, Yixiao and Wang, Xiaosong and Xu, Ziyue and Yu, Qihang and Yuille, Alan and Xu, Daguang.<br> "When radiology report generation meets knowledge graph" AAAI (2020). [paper]
Medical Visual Question Answering
<details> <summary><b>List of Papers:</b></summary>-
MUMC: Li, Pengfei and Liu, Gang and He, Jinlong and Zhao, Zixu and Zhong, Shenjun.<br> "Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering" MICCAI (2023). [paper] [code]
-
Van Sonsbeek, Tom and Derakhshani, Mohammad Mahdi and Najdenkoska, Ivona and Snoek, Cees GM and Worring, Marcel.<br> "Open-ended medical visual question answering through prefix tuning of language models" MICCAI (2023). [paper]
-
Tascon-Morales, Sergio and Márquez-Neila, Pablo and Sznitman, Raphael.<br> "Localized questions in medical visual question answering" MICCAI (2023). [paper]
-
CS-VQLA: Bai, Long and Islam, Mobarakol and Ren, Hongliang.<br> "Revisiting distillation for continual learning on visual question localized-answering in robotic surgery" MICCAI (2023). [paper] [code]
-
CAT-ViL: Bai, Long and Islam, Mobarakol and Ren, Hongliang.<br> "CAT-ViL: co-attention gated vision-language embedding for visual question localized-answering in robotic surgery" MICCAI (2023). [paper] [code]
-
hi-VQA: Pellegrini, Chantal and Keicher, Matthias and Özsoy, Ege and Navab, Nassir.<br> "Rad-restruct: A novel vqa benchmark and method for structured radiology reporting" MICCAI (2023). [paper] [code]
-
DeBCF: Zhan, Chenlu and Peng, Peng and Zhang, Hanrong and Sun, Haiyue and Shang, Chunnan and Chen, Tao and Wang, Hongsen and Wang, Gaoang and Wang, Hongwei.<br> "Debiasing Medical Visual Question Answering via Counterfactual Training" MICCAI (2023). [paper]
-
${MF^2-MVQA}$: Song, Shanshan and Li, Jiangyun and Wang, Jing and Cai, Yuanxiu and Dong, Wenkai.<br> "$MF^2-MVQA$: A Multi-Stage Feature Fusion Method for Medical Visual Question Answering" ISBI (2023). [paper] [code]
-
M2I2: Li, Pengfei and Liu, Gang and Tan, Lin and Liao, Jinying and Zhong, Shenjun.<br> "Self-supervised vision-language pretraining for medial visual question answering" ISBI (2023). [paper] [code]
-
Q2ATransformer: Liu, Yunyi and Wang, Zhanyu and Xu, Dong and Zhou, Luping.<br> "Q2atransformer: Improving medical vqa via an answer querying decoder" IPMI (2023). [paper]
-
Tascon-Morales, Sergio and Márquez-Neila, Pablo and Sznitman, Raphael.<br> "Consistency-preserving visual question answering in medical imaging" MICCAI (2022). [paper] [code]
-
RepsNet: Tanwani, Ajay K and Barral, Joelle and Freedman, Daniel.<br> "Repsnet: Combining vision with language for automated medical reports" MICCAI (2022). [paper]
-
Cong, Fuze and Xu, Shibiao and Guo, Li and Tian, Yinbing.<br> "Anomaly matters: An anomaly-oriented model for medical visual question answering" TMI (2022). [paper]
-
VQAMix: Gong, Haifan and Chen, Guanqi and Mao, Mingzhi and Li, Zhen and Li, Guanbin.<br> "Vqamix: Conditional triplet mixup for medical visual question answering" TMI (2022). [paper] [code]
-
Liu, Bo and Zhan, Li-Ming and Xu, Li and Wu, Xiao-Ming.<br> "Medical visual question answering via conditional reasoning and contrastive learning" TMI (2022). [paper] [code]
-
TraP-VQA: Naseem, Usman and Khushi, Matloob and Kim, Jinman.<br> "Vision-language transformer for interpretable pathology visual question answering" JBHI (2022). [paper]
-
MMQ: Do, Tuong and Nguyen, Binh X and Tjiputra, Erman and Tran, Minh and Tran, Quang D and Nguyen, Anh.<br> "Multiple meta-model quantifying for medical visual question answering" MICCAI (2021). [paper] [code]
-
CPRD: Liu, Bo and Zhan, Li-Ming and Wu, Xiao-Ming.<br> "Contrastive pre-training and representation distillation for medical visual question answering based on radiology images" MICCAI (2021). [paper] [code]
-
MMBERT: Khare, Yash and Bagal, Viraj and Mathew, Minesh and Devi, Adithi and Priyakumar, U Deva and Jawahar, CV.<br> "Mmbert: Multimodal bert pretraining for improved medical vqa" ISBI (2021). [paper] [code]
-
QC-MLB: Vu, Minh H and Löfstedt, Tommy and Nyholm, Tufve and Sznitman, Raphael.<br> "A question-centric model for visual question answering in medical imaging" TMI (2020). [paper]
-
MEVF: Nguyen, Binh D and Do, Thanh-Toan and Nguyen, Binh X and Do, Tuong and Tjiputra, Erman and Tran, Quang D.<br> "Overcoming data limitation in medical visual question answering" MICCAI (2019). [paper] [code]
Medical Multi-modal Diagnosis and Prognosis
<details> <summary><b>List of Papers:</b></summary>-
Xplainer: Pellegrini, Chantal and Keicher, Matthias and Özsoy, Ege and Jiraskova, Petra and Braren, Rickmer and Navab, Nassir.<br> "Xplainer: From x-ray observations to explainable zero-shot diagnosis" MICCAI (2023). [paper] [code]
-
Zhong, Yi and Xu, Mengqiu and Liang, Kongming and Chen, Kaixin and Wu, Ming.<br> "Ariadne's Thread: Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray Images" MICCAI (2023). [paper] [code]
-
CLIP-Lung: Lei, Yiming and Li, Zilong and Shen, Yan and Zhang, Junping and Shan, Hongming.<br> "CLIP-Lung: Textual knowledge-guided lung nodule malignancy prediction" MICCAI (2023). [paper]
-
GSDG: Chen, Shouyu and Guo, Xin and Zhu, Jianping and Wang, Yin.<br> "GSDG: Exploring a Global Semantic-Guided Dual-Stream Graph Model for Automated Volume Differential Diagnosis and Prognosis" MICCAI (2023). [paper]
-
Ichinose, Akimichi and Hatsutani, Taro and Nakamura, Keigo and Kitamura, Yoshiro and Iizuka, Satoshi and Simo-Serra, Edgar and Kido, Shoji and Tomiyama, Noriyuki.<br> "Visual grounding of whole radiology reports for 3d ct images" MICCAI (2023). [paper]
-
Liu, Jiaxiang and Hu, Tianxiang and Zhang, Yan and Gai, Xiaotang and Feng, Yang and Liu, Zuozhu.<br> "A chatgpt aided explainable framework for zero-shot medical image diagnosis" arXiv (2023). [paper]
-
WSI-MTMI: Liu, Jianxin and Ge, Rongjun and Wan, Peng and Zhu, Qi and Zhang, Daoqiang and Shao, Wei.<br> "Multi-task multi-instance learning for jointly diagnosis and prognosis of early-stage breast invasive carcinoma from whole-slide pathological images" IPMI (2023). [paper]
-
Song, Xuegang and Zhou, Feng and Frangi, Alejandro F and Cao, Jiuwen and Xiao, Xiaohua and Lei, Yi and Wang, Tianfu and Lei, Baiying.<br> "Multicenter and multichannel pooling GCN for early AD diagnosis based on dual-modality fused brain network" TMI (2022). [paper] [code]
-
Mehta, Sachin and Lu, Ximing and Wu, Wenjun and Weaver, Donald and Hajishirzi, Hannaneh and Elmore, Joann G and Shapiro, Linda G.<br> "End-to-end diagnosis of breast biopsy images with transformers" Medical image analysis (2022). [paper]
-
${M^2F}$: Lu, Zilin and Lu, Mengkang and Xia, Yong.<br> "M2F: A Multi-modal and Multi-task Fusion Network for Glioma Diagnosis and Prognosis" MICCAI (2022). [paper]
-
BERTHop: Monajatipoor, Masoud and Rouhsedaghat, Mozhdeh and Li, Liunian Harold and Jay Kuo, C-C and Chien, Aichi and Chang, Kai-Wei.<br> "Berthop: An effective vision-and-language model for chest x-ray disease diagnosis" MICCAI (2022). [paper] [code]
-
Kim, Daekyung and Nam, Chang-Mo and Park, Haesol and Jang, Mijung and Lee, Kyong Joon.<br> "Weakly supervised branch network with template mask for classifying masses in 3D automated breast ultrasound" WACV (2022). [paper]
-
Wu, Yujiao and Wang, Yaxiong and Huang, Xiaoshui and Yang, Fan and Ling, Sai Ho and Su, Steven Weidong.<br> "Multimodal Learning for Non-small Cell Lung Cancer Prognosis" arXiv (2022). [paper]
-
Tan, Kaiwen and Huang, Weixian and Liu, Xiaofeng and Hu, Jinlong and Dong, Shoubin.<br> "A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction" Artificial Intelligence in Medicine (2022). [paper]
-
Chen, Yifei and Li, Dandan and Zhang, Xin and Jin, Jing and Shen, Yi.<br> "Computer aided diagnosis of thyroid nodules based on the devised small-datasets multi-view ensemble learning" Medical Image Analysis (2021). [paper]
-
Gündel, Sebastian and Setio, Arnaud AA and Ghesu, Florin C and Grbic, Sasa and Georgescu, Bogdan and Maier, Andreas and Comaniciu, Dorin.<br> "Robust classification from noisy labels: Integrating additional knowledge for chest radiography abnormality assessment" Medical Image Analysis (2021). [paper]
-
Qiu, Di and Lui, Lok Ming.<br> "Modal Uncertainty Estimation for Medical Imaging Based Diagnosis" MICCAI (2021). [paper]
-
Bhalodia, Riddhish and Hatamizadeh, Ali and Tam, Leo and Xu, Ziyue and Wang, Xiaosong and Turkbey, Evrim and Xu, Daguang.<br> "Improving pneumonia localization via cross-attention on medical images and reports" MICCAI (2021). [paper]
-
Sekuboyina, Anjany and Oñoro-Rubio, Daniel and Kleesiek, Jens and Malone, Brandon.<br> "A relational-learning perspective to multi-label chest X-ray classification" ISBI (2021). [paper]
-
Wu, Joy and Gur, Yaniv and Karargyris, Alexandros and Syed, Ali Bin and Boyko, Orest and Moradi, Mehdi and Syeda-Mahmood, Tanveer.<br> "Automatic bounding box annotation of chest x-ray data for localization of abnormalities" ISBI (2020). [paper]
-
Chauhan, Geeticka and Liao, Ruizhi and Wells, William and Andreas, Jacob and Wang, Xin and Berkowitz, Seth and Horng, Steven and Szolovits, Peter and Golland, Polina.<br> "Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment" MICCAI (2020). [paper] [code]
-
van Sonsbeek, Tom and Worring, Marcel.<br> "Towards automated diagnosis with attentive multi-modal learning using electronic health records and chest x-rays" MICCAI (2020). [paper]
Medical Image Segmentation
<details> <summary><b>List of Papers:</b></summary>-
LViT: Li, Zihan and Li, Yunxiang and Li, Qingde and Wang, Puyang and Guo, Dazhou and Lu, Le and Jin, Dakai and Zhang, You and Hong, Qingqi.<br> "Lvit: language meets vision transformer in medical image segmentation" TMI (2024). [paper] [code]
-
SaLIP: Aleem, Sidra and Wang, Fangyijie and Maniparambil, Mayug and Arazo, Eric and Dietlmeier, Julia and Curran, Kathleen and Connor, Noel EO' and Little, Suzanne.<br> "Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation" CVPR (2024). [paper] [code]
-
SegICL: Shen, Lingdong and Shang, Fangxin and Yang, Yehui and Huang, Xiaoshuang and Xiang, Shining.<br> "SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging" arXiv (2024). [paper]
-
MedCLIP-SAM: Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming.<br> "MedCLIP-SAM: Bridging text and image towards universal medical image segmentation" arXiv (2024). [paper] [code]
-
Kunhimon, Shahina and Naseer, Muzammal and Khan, Salman and Khan, Fahad Shahbaz.<br> "Language Guided Domain Generalized Medical Image Segmentation" arXiv (2024). [paper] [code]
-
RecLMIS: Huang, Xiaoshuang and Li, Hongxiang and Cao, Meng and Chen, Long and You, Chenyu and An, Dong.<br> "Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation" arXiv (2024). [paper] [code]
-
${CPAM^{TG}}$: Lee, Go-Eun and Kim, Seon Ho and Cho, Jungchan and Choi, Sang Tae and Choi, Sang-Il.<br> "Text-guided cross-position attention for segmentation: Case of medical image" MICCAI (2023). [paper] [code]
-
TPRO: Zhang, Shaoteng and Zhang, Jianpeng and Xie, Yutong and Xia, Yong.<br> "TPRO: Text-Prompting-Based weakly supervised histopathology tissue segmentation" MICCAI (2023). [paper] [code]
-
Liu, Jie and Zhang, Yixiao and Chen, Jie-Neng and Xiao, Junfei and Lu, Yongyi and A Landman, Bennett and Yuan, Yixuan and Yuille, Alan and Tang, Yucheng and Zhou, Zongwei.<br> "Clip-driven universal model for organ segmentation and tumor detection" ICCV (2023). [paper] [code]
-
Han, Xianjun and Chen, Qianqian and Xie, Zhaoyang and Li, Xuejun and Yang, Hongyu.<br> "Multiscale progressive text prompt network for medical image segmentation" Computers & Graphics (2023). [paper]
-
Lu, Yixing and Fan, Zhaoxin and Xu, Min.<br> "Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation" International Conference on Multimedia Modeling (2024). [paper]
-
EMIT-Diff: Zhang, Zheyuan and Yao, Lanhong and Wang, Bin and Jha, Debesh and Keles, Elif and Medetalibeyoglu, Alpay and Bagci, Ulas.<br> "Emit-diff: Enhancing medical image segmentation via text-guided diffusion model" arXiv (2023). [paper]
-
GTGM: Chen, Yinda and Liu, Che and Huang, Wei and Cheng, Sibo and Arcucci, Rossella and Xiong, Zhiwei.<br> "Generative text-guided 3d vision-language pretraining for unified medical image segmentation" arXiv (2023). [paper]
-
Bi-VLGM: Wenting, Chen and Jie, Liu and Yixuan, Yuan.<br> "Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation" arXiv (2023). [paper]
-
Segre, Leo and Hirschorn, Or and Ginzburg, Dvir and Raviv, Dan.<br> "Shape-consistent generative adversarial networks for multi-modal medical segmentation maps" ISBI (2022). [paper] [code]
-
DTAN: Zhao, Yiyang and Li, Jinjiang and Ren, Lu and Chen, Zheng.<br> "DTAN: Diffusion-based Text Attention Network for medical image segmentation" Computers in Biology and Medicine (2024). [paper]
-
TGEDiff: Dong, Zhiwei and Yuan, Genji and Hua, Zhen and Li, Jinjiang.<br> "Diffusion model-based text-guided enhancement network for medical image segmentation" Expert Systems with Applications (2024). [paper]
Medical Image-Text Retrieval
<details> <summary><b>List of Papers:</b></summary>-
"Text-guided visual representation learning for medical image retrieval systems" ICPR (2022) [paper]
-
SECMR: "Semantic Extension for Cross-Modal Retrieval of Medical Image-Diagnosis Report" NLPCC (2023) [paper]
-
DMACH: "Deep medical cross-modal attention hashing" [paper]
-
"Retrieving chest X-rays for differential diagnosis: A deep metric learning approach" IEEE EMBS (2021) [paper]
-
X-TRA: "X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation" IPMI (2023) [paper]
-
"Category supervised cross-modal hashing retrieval for chest X-ray and radiology reports" Computers & Electrical Engineering (2022) [paper]
-
"Multi-Modal Medical Image Matching Based on Multi-Task Learning and Semantic-Enhanced Cross-Modal Retrieval" Traitement du signal (2023) [paper]
-
MMDL: "Multimodal multitask deep learning for X-ray image retrieval" MCCAI (2021) [paper]
-
"Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report" ML4H (2023) [paper]
-
BIMCV-R: "BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval" ArXiv (2024) [paper]
-
3D-MIR: "3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology" ArXiv (2023) [paper] [code]