Home

Awesome

This repo points to data from the papers Measuring Abstract Reasoning in Neural Networks; Barrett, Hill, Santoro et al. (2018) and Learning to Make Analogies by Contrasting Abstract Relational Structures; Hill, Santoro et al. (2019). The data can be found here

This data is made available for the purposes of non-commercial research only and is not to be used for any other purpose.

Procedurally Generated Matrices (PGM) data

From the paper Measuring Abstract Reasoning in Neural Networks, Barrett, Hill, Santoro et al. 2018.

UPDATE -- fix re: array shape and readability of the data

Users have noted issues with readability of the data. This is caused by a mis-shaped numpy array upon loading. As noted below, images are of size 160x160x16. When loading the .npz, please reshape the array:

data = np.load("name.npz")
image = data["image"].reshape(16, 160, 160)

The first dimension of image now indexes the array into the correct panels, as is intended. This array should now depict readable images when plotted.

Directory and file organisation

The parent data folder contains 8 archived directories, corresponding to a particular generalisation regime:

Within each folder are 1.42M .npz files encoding the samples. The naming convention is: PGM_{split_type}{train/test/val}{id}.npz, where split_type is one of the 8 indicated above, train/test/val is the train, test, or validation set, and id is a numerical identifier for a particular matrix.

A saved array has the following structure:

image: a 160x160x16 integer array with values from 0 to 255. The last dimension denotes the panel number for the matrix, with the first 8 panels being the "context", and the last 8 being the "choices".

meta_matrix: A 4x12 binary array encoding the structure of the matrix (i.e., the triples $[r, o, a]$ contained in the matrix). The rows index a tuple, and the columns have the following syntax:

'shape' : 0 'line' : 1 'color' : 2 'number' : 3 'position' : 4 'size' : 5 'type' : 6 'progression' : 7 'XOR' : 8 'OR' : 9 'AND' : 10 'consistent_union' : 11

meta_target: an OR operation applied across all binary-encoded triples [r, o, a] (i.e., the rows of the meta_matrix).

target: integer value denoting the target for the particular matrix (i.e., the index of the correct answer among the "choice" panels).

Notation

$R$ denotes the set of relation types (progression, XOR, OR, AND, consistent union), $O$ denotes the object types (shape, line), and $A$ denotes the attribute types (size, colour, position, number). The structure of a matrix, $S$, is the set of triples $S={[r, o, a]}$ that determine the challenge posed by a particular matrix.

Generalisation split details

The generalisation splits are as follows:

Visual Analogy data

From the paper [Learning to Make Analogies by Contrasting Abstract Relational Structures(https://openreview.net/pdf?id=SylLYsCcFm); Hill, Santoro et al (2019).

This data can be found in the {analogies} subdirectory, which contains archived directories corresponding to the visual analogy problems described in the paper.

novel.domain.transfer.tar.gz novel.target.domain.line.type.tar.gz novel.target.domain.shape.color.tar.gz interpolation.tar.gz extrapolation.tar.gz

Within each of these directories is a large number of .npz files. The filenames are of the form analogy_{split_type}{train/test/val}{lbc/normal}_{id}.npz, where split_type is the name of one of the corresponding directory, lbc/normal determines the nature of the incorrect answer candidates and id is a unique identifier for the file.