Awesome
<div align="center"> <img src="assets/mPLUG_new1.png" width="80%"> </div> <div align="center"> <h2>The Powerful Multi-modal LLM Familyfor OCR-free Document Understanding<h2> <strong>Alibaba Group</strong>
</div> <p align="center"> <a href="https://trendshift.io/repositories/9061" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9061" alt="DocOwl | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>📢 News
- 🔥🔥🔥 [2024.9.28] We have released the training data, inference code and evaluation code of DocOwl2 on both HuggingFace 🤗 and ModelScope <img src="./assets/modelscope.png" width='20'>.
- 🔥🔥🔥 [2024.9.20] Our paper DocOwl 1.5 and TinyChart is accepted by EMNLP 2024.
- 🔥🔥🔥 [2024.9.06] We release the arxiv paper of mPLUG-DocOwl 2, a SOTA 8B Multimodal LLM on OCR-free Multipage Document Understanding, each document image is encoded with just 324 tokens!
- 🔥🔥 [2024.7.16] Our paper PaperOwl is accepted by ACM MM 2024.
- [2024.5.08] We have released the training code of DocOwl1.5 supported by DeepSpeed. You can now finetune a stronger model based on DocOwl1.5!
- [2024.4.26] We release the arxiv paper of TinyChart, a SOTA 3B Multimodal LLM for Chart Understanding with Program-of-Throught ability (ChartQA: 83.6 > Gemin-Ultra 80.8 > GPT4V 78.5). The demo of TinyChart is available on HuggingFace 🤗. Both codes, models and data are released in TinyChart.
- [2024.4.3] We build demos of DocOwl1.5 on both ModelScope <img src="./assets/modelscope.png" width='20'> and HuggingFace 🤗, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
- [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace 🤗 and ModelScope <img src="./assets/modelscope.png" width='20'>.
- [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
- [2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace 🤗 and ModelScope <img src="./assets/modelscope.png" width='20'>, containing 447k high-resolution diagram images and corresponding paragraph analysis.
- [2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-sourced.
- [2023.10.10] Our paper UReader is accepted by EMNLP 2023.
- [2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
- [2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.
🤖 Models
-
mPLUG-DocOwl2 (Arxiv 2024) - mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
-
mPLUG-DocOwl1.5 (EMNLP 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
-
TinyChart (EMNLP 2024) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
-
mPLUG-PaperOwl (ACM MM 2024) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
-
UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
-
mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
📺 Online Demo
Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.
📖 DocOwl 1.5
-
🤗 HuggingFace Space
-
<img src="assets/modelscope.png" width='20'> ModelScope Space
📈 TinyChart-3B
- 🤗 HuggingFace Space