Home

Awesome

MLLM-DataEngine: Closing the Loop of Instruction Tuning Data Generation <a href='https://arxiv.org/pdf/2308.13566'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>

Shanghai Artificial Intelligence Laboratory

[Paper] [Data(huggingface)] [Data(opendatalab)] [Model]

Introduction

We propose MLLM-DataEngine, a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop iteration, the MLLM-DataEngine first analyzes the weakness of the model based on the evaluation results, then generates a proper incremental dataset for the next training iteration, and enhances the model capability iteratively.

<img width="100%" src="DataEngine_flowchart.png" alt="overview"></a>

Compared with previous instruction fine-tuning dataset collection methods which are separate from the benchmarking, MLLM-DataEngine shows better targeting and generates higher-quality data and improve MLLMs's capabilities more effectively.

<img width="100%" src="showcase.png" alt="overview"></a>

News and Updates

Dataset Format

The MLLM-DataEngine generate data contains a clear, consice instruction, and corresponding answer. Besides, the instruction-answer pair is reformatted into multi-choices question answering format. The generated data is organized in the following format:

[
    {
        "instruction": "Where is the man wearing a black backpack positioned in the picture?",
        "answer": "The man wearing a black backpack is located at the left side of the image, roughly in the middle between top and bottom",
        "short_answer": "Letf middle",
        "options": ["Top right", "Bottom right", "Bottom left", "Left middle"],
        "choide_answer": "D",
        "image": "vg/VG_100K_2/2404787.jpg",
        "qtype": 4,
    },
]

instruction: a clear, consice instruction

answer: direct answer to the instruction

short_answer: the short answer to the instruction

options: four options corresponding to the instruction

choice_answer: correct choice answer option

image: Visual Genome image path

qtype: question type in SEED-Bench, demonstrated in the following:

{
    "1": "Scene Understanding", 
    "2": "Instance Identity",
    "3": "Instance Attributes",
    "4": "Instance Location",  
    "5": "Instances Counting", 
    "6": "Spatial Relation", 
    "7": "Instance Interaction", 
    "8": "Visual Reasoning", 
    "9": "Text Understanding",
}

Main Results

LLaVA-1.5-lora-7b

Incremental DatasetData AmountSEEDMMBMMEGQAVQAv2ScienceQA
None(baseline)-66.0466.661475/290(1765)57.2777.5670.67/68.27
MLLM-DataEngine220k68.5767.181511/303(1814)58.0278.1873.17/71.15

MiniGPT4-v2

Incremental DatasetData AmountSEEDMMBOKVQAVizWizVSR
None(baseline)-49.2138.8356.0353.0861.37
MLLM-DataEngine270k63.8352.9256.8754.3962.43

Model Training and Evaluation

MiniGPT4-v2LLaVA-1.5
docdoc

Acknowledgement

Citation

If you're using MLLM-DataEngine in your research or applications, please cite using this BibTeX:

@misc{zhao2023mllmdataengine,
      title={MLLM-DataEngine: An Iterative Refinement Approach for MLLM}, 
      author={Zhiyuan Zhao and Linke Ouyang and Bin Wang and Siyuan Huang and Pan Zhang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
      year={2023},
      eprint={2308.13566},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact us

If you have any questions, comments or suggestions, please do not hesitate to contact us at zhaozhiyuan@pjlab.org.cn.

License

Apache License 2.0