Home

Awesome

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

<!-- ### šŸ”„šŸ”„šŸ”„ The LongerCrafter for longer high-quality video generation are now released! --> <div align="center"> <!-- <p style="font-weight: bold"> āœ… totally <span style="color: red; font-weight: bold">no</span> tuning &nbsp;&nbsp;&nbsp;&nbsp; āœ… less than <span style="color: red; font-weight: bold">20%</span> extra time &nbsp;&nbsp;&nbsp;&nbsp; āœ… support <span style="color: red; font-weight: bold">512</span> frames &nbsp;&nbsp;&nbsp;&nbsp; </p> -->

<a href='https://arxiv.org/abs/2311.16483'><img src='https://img.shields.io/badge/arXiv-2310.15169-b31b1b.svg'></a> Ā Ā Ā Ā Ā  <a href='https://tingxueronghua.github.io/ChartLlama/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> Ā Ā Ā Ā Ā  <a href='https://github.com/buaacyw/GaussianEditor/blob/master/LICENSE.txt'><img src='https://img.shields.io/badge/License-MIT-blue'></a> Ā Ā Ā Ā Ā  <br><br> Model Ā Ā Ā Ā Ā  Dataset

Yucheng Han*, Chi Zhang*(Corresponding Author), Xin Chen, Xu Yang, Zhibin Wang <br> Gang Yu, Bin Fu, Hanwang Zhang <br><br> (* equal contributions)

From Tencent and Nanyang Technological University.

<img src=./static/teaser_visualization_final_v3.png>

<!-- <p>Input: "A chihuahua in astronaut suit floating in space, cinematic lighting, glow effect"; <br> Resolution: 1024 x 576; Frames: 64.</p> --> <!-- <img src=assets/t2v/hd02.gif> <p>Input: "Campfire at night in a snowy forest with starry sky in the background"; <br> Resolution: 1024 x 576; Frames: 64.</p> --> </div>

šŸ”† Introduction

šŸ¤—šŸ¤—šŸ¤— We first create an instruction-tuning dataset based on our proposed data generation pipeline. Then, we train ChartLlama on this dataset and achieve the abilities shown in the figure.

Examples about the abilities of ChartLlama.

<div align="center"> <img src=./static/qualitative_visualization_04.png> <p>Redraw the chart according to the given chart, and edit the chart following instructions.</p> </div> <div align="center"> <img src=./static/qualitative_visualization_05.png> <p>Draw a new chart based on given raw data and instructions</p> </div>

šŸ“ Changelog

<!-- - __[2023.10.25]__: šŸ”„šŸ”„ Release the 256x256 model and support multi-prompt generation! --> <br> <!-- ## šŸ§° Models |Model|Resolution|Checkpoint|Description |:---------|:---------|:--------|:--------| |VideoCrafter (Text2Video)|320x512|[Hugging Face](https://huggingface.co/VideoCrafter)|Support 128 frames on NVIDIA A100 (40GB) |VideoCrafter (Text2Video)|576x1024|[Hugging Face](https://huggingface.co/VideoCrafter/Text2Video-1024-v1.0/blob/main/model.ckpt)|Support 64 frames on NVIDIA A100 (40GB) |VideoCrafter (Text2Video)|256x256|[Hugging Face](https://huggingface.co/VideoCrafter)|Support 512 frames on NVIDIA A100 (40GB) (Reduce the number of frames when you have smaller GPUs, e.g. 256x256 resolutions with 64 frames.) -->

āš™ļø Setup

Refer to the LLaVA-1.5. Since I have uploaded the code, you can just install by

pip install -e .
<!-- ### Install Environment via Anaconda (Recommended) ```bash conda create -n freenoise python=3.8.5 conda activate freenoise pip install -r requirements.txt ``` -->

šŸ’« Inference

You need to first install LLaVA-1.5, then use model_vqa_lora to do inference. The model_path is the path to our Lora checkpoints, the question-file is the json file containing all questions, the image-folder is the folder containing all your images and the answers-file is the output file name.

Here is an example:

CUDA_VISIBLE_DEVICES=1 python -m llava.eval.model_vqa_lora --model-path /your_path_to/LLaVA/checkpoints/${output_name} \
    --question-file /your_path_to/question.json \
    --image-folder ./playground/data/ \
    --answers-file ./playground/data/ans.jsonl \
    --num-chunks $CHUNKS \
    --chunk-idx $IDX \
    --temperature 0 \
    --conv-mode vicuna_v1 &

šŸ“– TO-DO LIST

šŸ˜‰ Citation

@misc{han2023chartllama,
      title={ChartLlama: A Multimodal LLM for Chart Understanding and Generation}, 
      author={Yucheng Han and Chi Zhang and Xin Chen and Xu Yang and Zhibin Wang and Gang Yu and Bin Fu and Hanwang Zhang},
      year={2023},
      eprint={2311.16483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

šŸ“¢ Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.