Home

Awesome

@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology (WACV 2025)

by Xin Jiang*, Junwei Zheng*, Ruiping Liu, Jiahang Li, Jiaming Zhang†, Sven Matthiesen, Rainer Stiefelhagen

* denotes equal contribution and † denotes corresponding author

<p align="center"> <a href="https://arxiv.org/pdf/2409.14215"> <img src="https://img.shields.io/badge/arXiv-2409.14215-red" /></a> <a href="https://junweizheng93.github.io/publications/ATBench/ATBench.html"> <img src="https://img.shields.io/badge/Project-page-green" /></a> <a href="https://pytorch.org/"> <img src="https://img.shields.io/badge/Framework-PyTorch-orange.svg" /></a> <a href="https://github.com/jystin/ATBench/blob/main/LICENSE"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a> </p>

News

<!-- <p align="center"> <img src="/images/pipeline.png" width="90%" height="90%"> </p> -->

pipeline

Introduction

multi_task_result

ATBench is designed by a pre-design user study with PVIs, including five five most crucial vision-language tasks: Panoptic Segmentation, Image Captioning, Visual Question Answering (VQA), Depth Estimation, Optical Character Recognition (OCR). And we also proposed a novel ATModel that can address all tasks simultaneously.

More detailed can be found in our arxiv paper.

Getting Started

Checkpoints and Numbers:

PS<br/>(ADE-150)DE<br/>(NYU-V2)OCR<br/>(6 datasets avg)IC<br/>(VizWiz_Cap)VQA<br/>(VizWiz_VQA)#Params
ModelPQRMSEAcc(%)CIDErAcc(%)
Unified-IO (S)-0.649--42.471M
Unified-IO (B)-0.469--45.8241M
Unified-IO (L)-0.402--47.7776M
X-Decoder (T)41.6----164M
GIT (T)---113.168.00.7B
PaLI (T)---117.267.53.0B
ATModel38.50.42580.152.553.762M

Installation, Dataset, Training and Evaluation Guide:

Acknowledgement

Citation

If you find our work useful in your research, please cite:

@inproceedings{jiang2025atbench,
title={@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology},
author={Jiang, Xin and Zheng, Junwei and Liu, Ruiping and Li, Jiahang and Zhang, Jiaming and Matthiesen, Sven and Stiefelhagen, Rainer},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}