Awesome
Interpretable Visual Reasoning via Induced Symbolic Space
This is the repo to host the code for OCCAM (Object-Centric Compositional Attention Model) in the following paper:
Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson and Humphrey Shi, Interpretable Visual Reasoning via Induced Symbolic Space, Arxiv link.
Note: Our code will be released soon, stay tuned.
Introduction
Our proposed OCCAM framework performs pure object-level reasoning and achieves a new state-of-the-art without human-annotated functional programs on the CLEVR dataset. Our framework makes the object-word cooccurrence information avaiable, which enables induction of the concepts and super concepts based on the inclusiveness and the mutual exclusiveness of words’ visual mappings. When working on concepts instead of visual features, OCCAM achieves comparable performance, proving the accuracy and sufficiency of the induced concepts.
<p align="center"> <img src=".github/teaser.png" width="65%"> </p>Results
In this table, we report the comparison of our object-level compositional reasoning framework to the state-of-the-art methods. * indicates the method uses external program annotations.
method | overall | count | exist | comp<br>numb | query<br>attr | comp<br>attr |
---|---|---|---|---|---|---|
Human | 92.6 | 86.7 | 96.6 | 86.5 | 95.0 | 96.0 |
NMN* | 72.1 | 52.5 | 72.7 | 79.3 | 79.0 | 78.0 |
N2NMN* | 83.7 | 68.5 | 85.7 | 84.9 | 90.0 | 88.7 |
IEP* | 96.9 | 92.7 | 97.1 | 98.7 | 98.1 | 98.9 |
TbD* | 99.1 | 97.6 | 99.4 | 99.2 | 99.5 | 99.6 |
NS-VQA* | 99.8 | 99.7 | 99.9 | 99.9 | 99.8 | 99.8 |
RN | 95.5 | 90.1 | 93.6 | 97.8 | 97.1 | 97.9 |
FiLM | 97.6 | 94.5 | 93.8 | 99.2 | 99.2 | 99.0 |
MAC | 98.9 | 97.2 | 99.4 | 99.5 | 99.3 | 99.5 |
NS-CL | 98.9 | 98.2 | 99.0 | 98.8 | 99.3 | 99.1 |
OCCAM (ours) | 99.4 | 98.1 | 99.8 | 99.0 | 99.9 | 99.9 |
Bibtex
@article{wang2020interpretable,
title={Interpretable Visual Reasoning via Induced Symbolic Space},
author={Wang, Zhonghao and Yu, Mo and Wang, Kai and Xiong, Jinjun and Hwu, Wen-mei and Hasegawa-Johnson, Mark and Shi, Humphrey},
journal={arXiv preprint arXiv:2011.11603},
year={2020}
}