Home

Awesome

AxisKeys.jl

Docstrings Github CI

<!--<img src="docs/readmefigure.png" alt="block picture" width="400" align="right">-->

This package defines a thin wrapper which, alongside any array, stores a vector of "keys" for each dimension. This may be useful to store perhaps actual times of measurements, or some strings labeling columns, etc. These will be propagated through many operations on arrays (including broadcasting, map, comprehensions, sum etc.) and altered by a few (sorting, fft, push!).

It works closely with NamedDims.jl, another wrapper which attaches names to dimensions. These names are a tuple of symbols, like those of a NamedTuple, and can be used for specifying which dimensions to sum over, etc. A nested pair of these wrappers can be made as follows:

using AxisKeys
data = rand(Int8, 2,10,3) .|> abs;
A = KeyedArray(data; channel=[:left, :right], time=range(13, step=2.5, length=10), iter=31:33)
<p align="center"> <img src="docs/readmeterminal.png" alt="terminal pretty printing" width="550" align="center"> </p>

The package aims not to be opinionated about what you store in these "key vectors": they can be arbitrary AbstractVectors, and need not be sorted nor have unique elements. Integer "keys" are allowed, and should have no surprising interactions with indices. While it is further from zero-cost than NamedDims.jl, it aims to be light-weight, leaving as much functionality as possible to other packages.

See <a href="#elsewhere">ยง elsewhere</a> below for other packages doing similar things.

Selections

Indexing still works directly on the underlying array, and keyword indexing (of a nested pair) works exactly as for a NamedDimsArray. But in addition, it is possible to pick out elements based on the keys, which for clarity I will call lookup. This is written with round brackets:

Dimension dIndexing: i โˆˆ axes(A,d)Lookup: key โˆˆ axiskeys(A,d)
by positionA[1,2,:]A(:left, 15.5, :)
by nameA[iter=1]A(iter=31)
by type--B = A(:left)

When using dimension names, fixing only some of them will return a slice, such as B = A[channel=1]. You may also give just one key, provided its type matches those of just one dimension, such as B = A(:left) where the key is a Symbol.

Note that indexing is the primary way to access the data. Lookup calls for example i = findfirst(axiskeys(A,1), :left) to convert keys to indices, thus will always be slower. If you want this to be the primary mode of access, then you may want a dictionary, possibly Dictionaries.jl.

There are also a numer of special selectors, which work like this:

IndexingLookup
one nearestB[time = 3]B(time = Near(17.0))vector
all in a rangeB[2:5, :]B(Interval(14,25), :)matrix
all matchingB[3:end, Not(3)]B(>(17), !=(33))matrix
mixtureB[1, Key(33)]B(Index[1], 33)scalar
non-scalarB[iter=[1, 3]]B(iter=[31, 33])matrix

Here Interval(13,18) can also be written 13..18, it's from IntervalSets.jl. Any functions can be used to select keys, including lambdas: B(time = t -> 0<t<17). You may give just one ::Base.Fix2 function (such as <=(18) or ==(20)) provided its argument type matches the keys of one dimension. An interval or a function always selects via findall, i.e. it does not drop a dimension, even if there is exactly one match.

While this table shows lookup selectors inside B(...), they can in fact all be used inside B[...], not just Key(k) as shown. They still refer to keys not indices! (This will not select dimension based on type, i.e. A[Key(:left)] is an error.) You may also write Index[end] but not Index[end-1].

By default lookup returns a view, while indexing returns a copy unless you add @views. This means that you can write into the array with B(time = <=(18)) .= 0. For scalar output, you cannot of course write B(13.0, 33) = 0 as this parsed as a function definition, but you can write B[Key(13.0), Key(33)] = 0, or else B(13.0, 33, :) .= 0 as a trailing colon makes a zero-dimensional view.

Construction

KeyedArray(rand(Int8, 2,10), ([:a, :b], 10:10:100)) # AbstractArray, Tuple{AbstractVector, ...}

A nested pair of wrappers can be constructed with keywords for names, and everything should work the same way in either order:

KeyedArray(rand(Int8, 2,10), row=[:a, :b], col=10:10:100)     # KeyedArray(NamedDimsArray(...))
NamedDimsArray(rand(Int8, 2,10), row=[:a, :b], col=10:10:100) # NamedDimsArray(KeyedArray(...))

Calling AxisKeys.keyless(A) removes the KeyedArray wrapper, if any, and NamedDims.unname(A) similarly removes the names (regardless of which is outermost).

There is another more "casual" constructor, via the function wrapdims. This does a bit more checking of inputs, and will adjust the length of ranges of keys if it can, and will fix indexing offsets if needed to match the array. The resulting order of wrappers is controlled by AxisKeys.nameouter()=false.

wrapdims(rand(Int8, 10), alpha='a':'z') 
# Warning: range 'a':1:'z' replaced by 'a':1:'j', to match size(A, 1) == 10

wrapdims(OffsetArray(rand(Int8, 10),-1), iter=10:10:100)
axiskeys(ans,1) # 10:10:100 with indices 0:9

Finally, wrapdims will also convert AxisArrays, NamedArrays, as well as NamedTuples.

Functions

The function axes(A) returns (a tuple of vectors of) indices as usual, and axiskeys(A) similarly returns (a tuple of vectors of) keys. If the array has names, then dimnames(A) returns them. These functions work like size(A, d) = size(A, name) to get just one.

The following things should work:

To allow for this limited mutability, V.keys isa Ref for vectors, while A.keys isa Tuple for matrices & higher. But axiskeys(A) always returns a tuple.

Absent

As for NamedDims.jl, the guiding idea is that every operation which could be done on ordinary arrays should still produce the same data, but propagate the extra information (names/keys), and error if it conflicts.

Both packages allow for wildcards, which never conflict. In NamedDims.jl this is the name :_, here it is a Base.OneTo(n), like the axes of an Array. These can be constructed as M = wrapdims(rand(2,2); _=[:a, :b], cols=nothing), and for instance M .+ M' is not an error.

If you need lookup to be very fast, then you will want to use a package like UniqueVectors.jl or AcceleratedArrays.jl or CategoricalArrays.jl. To apply such a type to all dimensions, you may write D = wrapdims(rand(1000), UniqueVector, rand(Int, 1000)). Then D(n) here will use the fast lookup from UniqueVectors.jl (about 60x faster).

When a key vector is a Julia AbstractRange, then this package provides some faster overloads for things like findall(<=(42), 10:10:100).

Elsewhere

This is more or less an attempt to replace AxisArrays with several smaller packages. The complaints are: (1) It's confusing to guess whether to perform indexing or lookup based on whether it is given an integer (index) or not (key). (2) Each "axis" was its own type Axis{:name} which allowed zero-overhead lookup before Julia 1.0. But this is now possible with a simpler design. (They were called axes before Base.axes() was added, hence (3) the confusing terminology.) (4) Broadcasting is not supported, as this changed dramatically in Julia 1.0. (5) There are lots of assorted functions, special categorical vector types, etc. which aren't part of the core, and are poorly documented.

Other older packages (pre-Julia-1.0):

Other new packages (post-1.0):

See also docs/speed.jl for some checks on this package, and comparisons to other ones. And see docs/repl.jl for some usage examples, showing pretty printing.

In ๐Ÿ-land: