Awesome

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Since the host server of EST-VQA dataset is no longer available, we provide the download link of the dataset in this repository.

We also release the test annotation here, so you don't have to use the EvalAI for evaluation now.

Download

Google Drive: [Images Train] [Images Test] [Annotations Train] [Annotations Test]

Baidu Netdisk: [Images](code: dcmn) [Annotations](code:e4qe)

Evaluation

You can use eval.py to evaluate your model on EST-VQA dataset. Simply convert your prediction file to the same format as pred_sample.json and run the following command:

python eval.py --pred_file PATH_TO_PRED --gt_file PATH_TO_GT

Leaderboard

Part of the results is borrowed from this paper.

Year	Venue	Model	LLM-based	EST-VQA (En)	EST-VQA (CN)
2023	ICML	BLIP2-OPT-6.7B	Y	40.7	0
2023	NeurIPS	InstructBlip	Y	48.6	0.1
2023	arxiv	mPlug-Owl	Y	52.7	0
2023	arxiv	LLaVAR	Y	58.2	0
2023	NeurIPS	LLaVA-1.5-7B	Y	52.3	0
2024	AAAI	BLIVA	Y	51.2	0.2
2024	CVPR	mPLUG-Owl2	Y	68.6	4.9
2024	CVPR	Monkey	Y	71	42.6

Citation:

If you found EST-VQA useful in your research, please kindly cite using the following BibTeX:

@inproceedings{wang2020general,
  title={On the general value of evidence, and bilingual scene-text visual question answering},
  author={Wang, Xinyu and Liu, Yuliang and Shen, Chunhua and Ng, Chun Chet and Luo, Canjie and Jin, Lianwen and Chan, Chee Seng and Hengel, Anton van den and Wang, Liangwei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10126--10135},
  year={2020}
}