Home

Awesome

Visually Dehallucinative Instruction Generation

(CAP2QA) Visually Dehallucinative Instruction Generation [paper] <br> Sungguk Cha, Jusung Lee, Younghyun Lee and Cheoljong Yang

See also, (IDK) Visually Dehallucinative Instruction Generation: Know What You Don't Know [paper] [github] <br>

CAP2QA

Image-aligned Sentence Level VQA Data

<img src="images/fig1.png"> <br> <img src="images/examples.png">

Details

DatasetAvg. #word Question/Answer#Image#QuestionScalableImageAlignedRecognitionDescriptionReasoning
DAQUAR11.5/1.1 (word)1,44912,468$\times$$\checkmark$$\checkmark$$\times$$\times$
VQAv26.1/1.2 (word)200k1.1M$\times$$\checkmark$$\checkmark$$\times$$\times$
OKVQA8.1/1.3 (word)14,03114,055$\times$$\times$$\checkmark$$\times$$\checkmark$
LLaVA10.7/60.7 (sentence)80,000221,333$\checkmark$$\times$$\checkmark$$\checkmark$$\checkmark$
CAP2QA (Ours)7.2/5.4 (sentence)122,906873,631$\checkmark$$\checkmark$$\checkmark$$\checkmark$$\checkmark$

Prepare MSCOCO 2017 images. Train/Val splits are preserved.

Citation

If you find CAP2QA useful for your research and applications, please cite using this BibTeX:

@inproceedings{cha2024visually,
      title={Visually Dehallucinative Instruction Generation}, 
      author={Cha, Sungguk and Lee, Jusung and Lee, Younghyun and Yang, Cheoljong},
      booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      year={2024},
}

Licenses

This work, instructions, used COCO-Caption dataset (CC BY-NC-ND license) for the caption source and ChatGPT (refer OpenAI policies, https://openai.com/policies).