Home

Awesome

<p align="center"> <img src="./assets/project-name.jpeg"/> <br /> </p> <div align="center"> <p align="center"> ๐Ÿ”ฌ <b>Paper</b> (๐Ÿ”— <a href="https://arxiv.org/abs/2410.17241">arXiv</a>, ๐Ÿค— <a href="https://huggingface.co/papers/2410.17241">Huggingface</a>, ๐Ÿค– <a href="https://www.aimodels.fyi/papers/arxiv/frontiers-intelligent-colonoscopy">AIModels.fyi</a> | ๐Ÿ“– <b>ColonSurvey</b> (๐Ÿ”— <a href="https://docs.google.com/spreadsheets/d/1V_s99Jv9syzM6FPQAJVQqOFm5aqclmrYzNElY6BI18I/edit?usp=sharing">Online Sheet</a>) | ๐Ÿฅ <b>ColonINST</b> (๐Ÿ”— <a href="https://drive.google.com/drive/folders/1ng2DQav-Gfts6hIr3_vCUC-a2gCWzzCO?usp=sharing">Google Drive</a>, ๐Ÿค— <a href="https://huggingface.co/datasets/ai4colonoscopy/ColonINST-v1">Huggingface</a>) | ๐Ÿค– <b>ColonGPT</b> (๐Ÿ”— <a href="https://drive.google.com/file/d/1WL0OIPiwiLeApoK8xaMZ1HR30ZrDtoMk/view?usp=sharing">Google Drive</a>, ๐Ÿค— <a href="https://huggingface.co/ai4colonoscopy/ColonGPT-v1">Huggingface</a>) | ๐Ÿ‡ <b>Multimodal benchmark</b> (๐Ÿ”— <a href="https://drive.google.com/drive/folders/1q3awr-aT50tuhW9Z01C3LKkckfG4Bk70?usp=sharing">Google Drive</a>, ๐Ÿ”— <a href="https://paperswithcode.com/dataset/coloninst-v1">PaperWithCode</a>) </p> <p align="center"> <i>Keyworks: Intelligent Colonoscopy, Multimodal Colonoscopy Dataset, Multimodal Language Model, Vision-language Understanding, Endoscopic Image Analysis, Healthcare AI, Abdomen.</i> </p> </div>
<img align="right" src="./assets/teaser-figure.png" width="285px" />

Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer (๐Ÿ”— Wikipedia). Have you ever wondered how to make colonoscopy smarter? Well, buckle up, let's enter the exciting world of intelligent colonoscopy!

Updates

๐Ÿ”ฅ Research Highlights

<p align="center"> <img src="./assets/overview_for_github.png" width="800px" /> <br/> <em> Figure 1: Introductary diagram. </em> </p>

๐Ÿ“– ColonSurvey

Our "ColonSurvey" project contributes various useful resources for the community. We investigate 63 colonoscopy datasets and 137 deep learning models focused on colonoscopic scene perception, all sourced from leading conferences or journals since 2015. This is a quick overview of our investigation; for a more detailed discussion, please refer to our paper in PDF format.

<p align="center"> <img src="./assets/colonsurvey.png"/> <br /> <em> Figure 2: The investigation of colonoscopy datasets and models. </em> </p>

To better understand developments in this rapidly changing field and accelerate researchersโ€™ progress, we are building a ๐Ÿ“– paper reading list, which includes a number of AI-based scientific studies on colonoscopy imaging from the past 12 years. [UPDATE ON OCT-14-2024] In detail, our online list contains:

<!-- Additionally, we will provide some interesting resources about human colonoscopy. Our investigation includes the latest intelligent colonoscopy techniques across four tasks for colonoscopic scene understanding: classification, detection, segmentation, and vision-language tasks. - We build a [WIKI page](https://github.com/ai4colonoscopy/awesome-intelligent-colonoscopy) for medical colonoscopy serves several important purposes. ๐Ÿ“ **Continuous Updates.** -->

Make our community great again. If we miss your valuable work in google sheet, please add it and this project would be a nice platform to promote your work. Or anyone can inform us via email (๐Ÿ“ฎ gepengai.ji@gmail.com) or push a PR in github. We will work on your request as soon as possible. Thank you for your active feedback.

<!-- ๐ŸŒ **Knowledge Hub.** We build an [hub page](https://github.com/ai4colonoscopy/awesome-intelligent-colonoscopy/tree/main/wikipedia-page) to introduce the basic knowledge related to intelligent colonoscopy research, ensuring that both medical professionals and AI researchers can easily access this research domain. -->

๐Ÿฅ ColonINST (A multimodal instruction tuning dataset)

<p align="center"> <img src="./assets/coloninst-overview.png"/> <br /> <em> Figure 3: Details of our multimodal instruction tuning dataset, ColonINST. (a) Three sequential steps to create the instruction tuning dataset for multimodal research. (b) Numbers of colonoscopy images designated for training, validation, and testing purposes. (c) Data taxonomy of three-level categories. (d) A word cloud of the category distribution by name size. (e) Caption generation pipeline using the VL prompting mode of GPT-4V. (f) Numbers of human-machine dialogues created for four downstream tasks. </em> </p>

Our data contains two parts: colonoscopy images and human-machine dialogues (available at ๐Ÿค— huggingface and ๐Ÿ”— google drive). However, due to privacy-preserving concerns, we can not directly share the origin medical images without its authorization. DO NOT WORRY! We prepare a data download list and an easy-to-use script to organise our ColonINST. The operation instructions are detailed in our document (๐Ÿ”— .docs/guideline-for-ColonINST.md)

๐Ÿค– ColonGPT (A colonoscopy-specific multimodal Language Model)

<p align="center"> <img src="./assets/ColonGPT.gif" width="666px"/> <br /> <em> Figure 4: Details of our multimodal language model, ColonGPT. </em> </p>

Our ColonGPT is a standard multimodal language model, which has been released at ๐Ÿ”— google drive platform. It contains four basic components: a language tokenizer, an visual encoder (๐Ÿค— SigLIP-SO or ๐Ÿ”— google drive), a multimodal connector, and a language model (๐Ÿค— Phi1.5 or ๐Ÿ”— google drive).

โœ… Quick start

We show a code snippet to show you how to quickly try-on our ColonGPT model with HuggingFace transformers quickly. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing ColonGPT's source code to explore more.

โœ… Try full-version ColonGPT

The complete workflow scheme has been officially released, providing a streamlined and user-friendly process. This allows even average community users to easily develop, customize, and enhance their own models. To ensure a smooth experience, we have included comprehensive step-by-step instructions (๐Ÿ”— .docs/guideline-for-ColonGPT.md), which can be found in our detailed documentation. These resources are designed to guide users through every stage of the development process, making it accessible and efficient for both beginners and advanced practitioners.

๐Ÿ’ฏ Multimodal benchmark

<p align="center"> <img src="./assets/multimodal_benchmark.png"/> <br /> <em> Figure 4: Multimodal benchmark. </em> </p>

We provide a comprehensive benchmark of eight latest multimodal competitors across three multimodal colonoscopy tasks, including MiniGPT-V2, LLaVA-v1, LLaVA-v1.5, Bunny-v1.0-3B, Mini-Gemini-2B, MobileVLM-1.7B, and LLaVA-Med-v1.0, and LLaVA-Med-v1.5. We provide ๐Ÿ”— the meta prediction files, ๐Ÿ”— the evaluation instructions. We believe these resources facilitate everyone to conveniently access their newly developed model, or rapidly conduct proof-of-concept development for follow-up research.

๐Ÿ™ Acknowledgement

We gratefully acknowledge the contributions of the following projects, which served as the foundation and inspiration for our work:

๐Ÿ‘ Citation

Please use the following reference if you find this project useful for your research or applications:

@article{ji2024frontiers,
  author = {Ji, Ge-Peng and Liu, Jingyi and Xu, Peng and Barnes, Nick and Khan, Fahad Shahbaz and Khan, Salman and Fan, Deng-Ping},
  title = {Frontiers in Intelligent Colonoscopy},
  journal = {arXiv preprint arXiv:2410.17241},
  year = {2024}
}

๐Ÿšจ Ethical and Responsible Use

ColonGPT is designed to assist in medical colonoscopy by leveraging multimodal capabilities, but it comes with no guarantees regarding its predictive accuracy or reliability in clinical practice. Users should be aware that the datasets and pre-trained models used in ColonGPT may contain inherent biases, including socioeconomic factors, which can lead to misclassification or other undesirable behaviors, such as the generation of offensive or inappropriate content.

We urge users and developers to carefully review and validate the performance of pre-trained models, particularly those integrated through the ColonGPT framework, before considering practical applications in a clinical setting. It is crucial that any AI-driven tool used in healthcare undergoes rigorous testing to ensure patient safety and avoid unintended consequences. Our commitment to ethical AI use extends to ongoing efforts to investigate, address, and mitigate the risks of bias and inappropriate behavior in ColonGPT. Continuous improvement of this codebase is a priority to ensure that the system aligns with responsible and equitable healthcare standards.