Awesome
ChartLlama: A Multimodal LLM for Chart Understanding and Generation
<!-- ### š„š„š„ The LongerCrafter for longer high-quality video generation are now released! --> <div align="center"> <!-- <p style="font-weight: bold"> ā totally <span style="color: red; font-weight: bold">no</span> tuning ā less than <span style="color: red; font-weight: bold">20%</span> extra time ā support <span style="color: red; font-weight: bold">512</span> frames </p> --><a href='https://arxiv.org/abs/2311.16483'><img src='https://img.shields.io/badge/arXiv-2310.15169-b31b1b.svg'></a> Ā Ā Ā Ā Ā <a href='https://tingxueronghua.github.io/ChartLlama/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> Ā Ā Ā Ā Ā <a href='https://github.com/buaacyw/GaussianEditor/blob/master/LICENSE.txt'><img src='https://img.shields.io/badge/License-MIT-blue'></a> Ā Ā Ā Ā Ā <br><br> Ā Ā Ā Ā Ā
Yucheng Han*, Chi Zhang*(Corresponding Author), Xin Chen, Xu Yang, Zhibin Wang <br> Gang Yu, Bin Fu, Hanwang Zhang <br><br> (* equal contributions)
From Tencent and Nanyang Technological University.
<img src=./static/teaser_visualization_final_v3.png>
<!-- <p>Input: "A chihuahua in astronaut suit floating in space, cinematic lighting, glow effect"; <br> Resolution: 1024 x 576; Frames: 64.</p> --> <!-- <img src=assets/t2v/hd02.gif> <p>Input: "Campfire at night in a snowy forest with starry sky in the background"; <br> Resolution: 1024 x 576; Frames: 64.</p> --> </div>š Introduction
š¤š¤š¤ We first create an instruction-tuning dataset based on our proposed data generation pipeline. Then, we train ChartLlama on this dataset and achieve the abilities shown in the figure.
Examples about the abilities of ChartLlama.
<div align="center"> <img src=./static/qualitative_visualization_04.png> <p>Redraw the chart according to the given chart, and edit the chart following instructions.</p> </div> <div align="center"> <img src=./static/qualitative_visualization_05.png> <p>Draw a new chart based on given raw data and instructions</p> </div>š Changelog
- [2023.11.27]: š„š„ Update the inference code and model weights.
- [2023.11.27]: Create the git repository.
āļø Setup
Refer to the LLaVA-1.5. Since I have uploaded the code, you can just install by
pip install -e .
<!--
### Install Environment via Anaconda (Recommended)
```bash
conda create -n freenoise python=3.8.5
conda activate freenoise
pip install -r requirements.txt
``` -->
š« Inference
You need to first install LLaVA-1.5, then use model_vqa_lora to do inference. The model_path is the path to our Lora checkpoints, the question-file is the json file containing all questions, the image-folder is the folder containing all your images and the answers-file is the output file name.
Here is an example:
CUDA_VISIBLE_DEVICES=1 python -m llava.eval.model_vqa_lora --model-path /your_path_to/LLaVA/checkpoints/${output_name} \
--question-file /your_path_to/question.json \
--image-folder ./playground/data/ \
--answers-file ./playground/data/ans.jsonl \
--num-chunks $CHUNKS \
--chunk-idx $IDX \
--temperature 0 \
--conv-mode vicuna_v1 &
š TO-DO LIST
- Create and open source a new chart dataset in Chinese.
- Open source the training scripts and the dataset.
- Open source the evaluation scripts.
- Open source the evaluation dataset.
- Open source the inference script.
- Open source the model.
- Create the git repository.
š Citation
@misc{han2023chartllama,
title={ChartLlama: A Multimodal LLM for Chart Understanding and Generation},
author={Yucheng Han and Chi Zhang and Xin Chen and Xu Yang and Zhibin Wang and Gang Yu and Bin Fu and Hanwang Zhang},
year={2023},
eprint={2311.16483},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
š¢ Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.