Home

Awesome

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

:bookmark_tabs:Paper :file_folder:Data :orange_book:Notebook :black_nib:BibTex :rocket:Preview :scroll:Poster

Authors: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

:fire: News

:rocket: A more advanced version is coming!

We are building a new version with a larger data scale, more object categories, and higher-quality images and text, and more. You can preview it at this website, and the full version will come soon.

:mag: SPEC Benchmark

To evaluate the understanding capability of visual-language models on fine-grained concepts, we propose a new benchmark, SPEC, which consists of six distinct subsets, distributed across the dimensions of Size, Position, Existence, and Count. Each test case consists of an image candidate set, which differs only in certain visual concepts, and a text candidate set, which differs only in the corresponding language concept.

<p align="center"> <img src="assets/spec_overview.png" width="720px"/> <be> </p>

:wrench: Usage

install

git clone https://github.com/wjpoom/SPEC.git
cd SPEC/
pip install -e .

prepare data

import zipfile
import os
from huggingface_hub import hf_hub_download

data_root = '/path/to/save/data'
hf_hub_download(repo_id='wjpoom/SPEC', repo_type='dataset', filename='data.zip', local_dir=data_root)

with zipfile.ZipFile(os.path.join(data_root, 'data.zip'), 'r') as zip_ref:
    zip_ref.extractall(os.path.join(data_root))
    
os.remove(os.path.join(data_root, 'data.zip'))

explore the dataset

reproduce the results

evaluate custom VLMs

:memo: TODO

:clap: Acknowledgement

Part of this repository is built upon ARO, thanks for the well-organized codebase.

Contact Us

Feel free to contact us if you have any questions or suggestions

Email (Wujian Peng): wjpeng24@m.fudan.edu.cn

:black_nib: Citation

If you use our code or data in this repo or find our work helpful, please consider giving a citation:

@inproceedings{spec2024,
  title={Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding},
  author={Peng, Wujian and Xie, Sicheng and You, Zuyao and Lan, Shiyi and Wu, Zuxuan},
  booktitle={CVPR},
  year={2024}
}