Home

Awesome

PrAE

This repo contains code for our CVPR 2021 paper.

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Chi Zhang*, Baoxiong Jia*, Song-Chun Zhu, Yixin Zhu
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
(* indicates equal contribution.)

Spatial-temporal reasoning is a challenging task in Artificial Intelligence (AI) due to its demanding but unique nature: a theoretic requirement on representing and reasoning based on spatial-temporal knowledge in mind, and an applied requirement on a high-level cognitive system capable of navigating and acting in space and time. Recent works have focused on an abstract reasoning task of this kind -- Raven’s Progressive Matrices (RPM). Despite the encouraging progress on RPM that achieves human-level performance in terms of accuracy, modern approaches have neither a treatment of human-like reasoning on generalization, nor a potential to generate answers. To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects. Specifically, we disentangle perception and reasoning from a monolithic model. The neural visual perception frontend predicts objects' attributes, later aggregated by a scene inference engine to produce a probabilistic scene representation. In the symbolic logical reasoning backend, the PrAE learner uses the representation to abduce the hidden rules. An answer is predicted by executing the rules on the probabilistic representation. The entire system is trained end-to-end in an analysis-by-synthesis manner without any visual attribute annotations. Extensive experiments demonstrate that the PrAE learner improves cross-configuration generalization and is capable of rendering an answer, in contrast to prior works that merely make a categorical choice from candidates.

model

Performance

The following table shows the performance of various methods on the RAVEN and I-RAVEN datasets. For details, please check our paper.

Performance on RAVEN / I-RAVEN:

MethodAccCenter2x2Grid3x3GridL-RU-DO-ICO-IG
WReN9.86/14.878.65/14.2529.60/20.509.75/15.704.40/13.755.00/13.505.70/14.155.90/12.25
LSTM12.81/12.5212.70/12.5513.80/13.5012.90/11.3512.40/14.3012.10/11.3512.45/11.5513.30/13.05
LEN12.29/13.6011.85/14.8541.40/18.2012.95/13.353.95/12.553.95/12.755.55/11.156.35/12.35
CNN14.78/12.6913.80/11.3018.25/14.6014.55/11.9513.35/13.0015.40/13.3014.35/11.8013.75/12.85
MXGNet20.78/13.0712.95/13.6537.05/13.9524.80/12.5017.45/12.5016.80/12.0518.05/12.9518.35/13.90
ResNet24.79/13.1924.30/14.5025.05/14.3025.80/12.9523.80/12.3527.40/13.5525.05/13.4022.15/11.30
ResNet+DRT31.56/13.2631.65/13.2039.55/14.3035.55/13.2525.65/12.1532.05/13.1031.40/13.7025.05/13.15
SRAN15.56/29.0618.35/37.5538.80/38.3017.40/29.309.45/29.5511.35/28.655.50/21.158.05/18.95
CoPINet52.96/22.8449.45/24.5061.55/31.1052.15/25.3568.10/20.6065.40/19.8539.55/19.0034.55/19.45
PrAE Learner65.03/77.0276.50/90.4578.60/85.3528.55/45.6090.05/96.2590.85/97.3548.05/63.4542.60/60.70
Human84.4195.4581.8279.5586.3681.8186.3681.81

Dependencies

Important

See requirements.txt for a full list of packages required.

Usage

To train the PrAE learner, one needs to first extract rule annotations for the training configuration. We provide a simple script in src/auxiliary for doing this. Properly set path in the main() function, and your dataset folder will be populated with rule annotations in npz files.

To train the PrAE learner, run

python src/main.py train --dataset <path to dataset>

The default hyper-parameters should work. However, you can check main.py for a full list of arguments you can adjust.

In the codebase, window sliding and image preprocessing are delegated to the dataset loader and the code only supports training on configurations with a single component.

One thing we notice after code cleaning is that curriculum learning is not necessary, but in the manuscript we keep our original discovery.

To test on a new configuration, run

python src/main.py test --dataset <path to dataset> --config <new config> --model-path <path to a trained model>

Testing on 3x3Grid could potentially raise a CUDA-out-of-memory error. Try running on CPU then.

Citation

If you find the paper and/or the code helpful, please cite us.

@inproceedings{zhang2021abstract,
    title={Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution},
    author={Zhang, Chi and Jia, Baoxiong and Zhu, Song-Chun and Zhu, Yixin},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2021}
}

Acknowledgement

We'd like to express our gratitude towards all the colleagues and anonymous reviewers for helping us improve the paper. The project is impossible to finish without the following open-source implementations.