Home

Awesome

Leabra in Go emergent

Go Report Card Go Reference CI Codecov

This is the Go implementation of the Leabra algorithm for biologically based models of cognition, based on the emergent framework.

Leabra and emergent use the Cogent Core GUI framework. See install instructions there. Once those prerequisites are in place, then the simplest way to run a simulation is:

$ core run [platform]

where [platform] is optional (defaults to your local system), and can include android, ios and web!

See the ra25 example for a complete working example (intended to be a good starting point for creating your own models), and any of the 26 models in the Comp Cog Neuro sims repository which also provide good starting points. The emergent wiki install page has a tutorial for how to create your own simulation starting from the ra25 example.

Current Status / News

Design

func (nt *Network) InitActs() {
	for _, ly := range nt.Layers {
		if ly.IsOff() {
			continue
		}
		ly.(*Layer).InitActs() // ly is the emer.Layer interface -- (*Layer) converts to leabra.Layer
	}
}

Naming Conventions

There are several changes from the original C++ emergent implementation for how things are named now:

The Leabra Algorithm

Leabra stands for Local, Error-driven and Associative, Biologically Realistic Algorithm, and it implements a balance between error-driven (backpropagation) and associative (Hebbian) learning on top of a biologically based point-neuron activation function with inhibitory competition dynamics (either via inhibitory interneurons or an approximation thereof), which produce k-Winners-Take-All (kWTA) sparse distributed representations. Extensive documentation is available from the online textbook: Computational Cognitive Neuroscience which serves as a second edition to the original book: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, O'Reilly and Munakata, 2000, Cambridge, MA: MIT Press. Computational Explorations..

The name is pronounced like "Libra" and is intended to connote the balance of various different factors in an attempt to approach the "golden middle" ground between biological realism and computational efficiency and the ability to simulate complex cognitive function.

The version of Leabra implemented here corresponds to version 8.5 of C++ emergent (cemer).

The basic activation dynamics of Leabra are based on standard electrophysiological principles of real neurons, and in discrete spiking mode we implement exactly the AdEx (adapting exponential) model of Gerstner and colleagues Scholarpedia article on AdEx. The basic leabra package implements the rate code mode (which runs faster and allows for smaller networks), which provides a very close approximation to the AdEx model behavior, in terms of a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single rate-code neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex. Conversion factors from biological units from AdEx to normalized units used in Leabra are in this google sheet.

The excitatory synaptic input conductance (Ge in the code, known as net input in artificial neural networks) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a projection level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections. See section on Input Scaling for details.

Inhibition is computed using a feed-forward (FF) and feed-back (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average excitatory net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.

There is a single learning equation, derived from a very detailed model of spike timing dependent plasticity (STDP) by Urakubo, Honda, Froemke, et al (2008), that produces a combination of Hebbian associative and error-driven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by Bienenstock, Cooper, and Munro (1982). The essential learning dynamic involves a Hebbian-like co-product of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this co-product is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a longer-term running average of the receiving neuron activation (AvgL in the code). This is the key idea for the BCM algorithm. To produce error-driven learning, the floating threshold is based on a faster running average of activation co-products (AvgM), which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.

Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 0-1 range. Contrast enhancement is important for enhancing the selectivity of self-organizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression.

There are various extensions to the algorithm that implement special neural mechanisms associated with the prefrontal cortex and basal ganglia PBWM, dopamine systems PVLV, the Hippocampus, and predictive learning and temporal integration dynamics associated with the thalamocortical circuits DeepLeabra. All of these are (will be) implemented as additional modifications of the core, simple leabra implementation, instead of having everything rolled into one giant hairball as in the original C++ implementation.

Pseudocode as a LaTeX doc for Paper Appendix

You can copy the mediawiki source of the following section into a file, and run pandoc on it to convert to LaTeX (or other formats) for inclusion in a paper. As this wiki page is always kept updated, it is best to regenerate from this source -- very easy:

curl "https://raw.githubusercontent.com/emer/leabra/main/README.md" -o appendix.md
pandoc appendix.md -f gfm -t latex -o appendix.tex

You can then edit the resulting .tex file to only include the parts you want, etc.

Leabra Algorithm Equations

The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm fit together, using the equations and variables from the actual code. Compared to the original C++ emergent implementation, this Go version of emergent is much more readable, while also not being too much slower overall.

There are also other implementations of Leabra available:

This repository contains specialized additions to the core algorithm described here:

Timing

Leabra is organized around the following timing, based on an internally generated alpha-frequency (10 Hz, 100 msec periods) cycle of expectation followed by outcome, supported by neocortical circuitry in the deep layers and the thalamus, as hypothesized in the DeepLeabra extension to standard Leabra:

Variables

The leabra.Neuron struct contains all the neuron (unit) level variables, and the leabra.Layer contains a simple Go slice of these variables. Optionally, there can be leabra.Pool pools of subsets of neurons that correspond to hypercolumns, and support more local inhibitory dynamics (these used to be called UnitGroups in the C++ version).

The following are more implementation-level variables used in integrating synaptic inputs:

Neurons are connected via synapses parameterized with the following variables, contained in the leabra.Synapse struct. The leabra.Prjn contains all of the synaptic connections for all the neurons across a given layer -- there are no Neuron-level data structures in the Go version.

Activation Update Cycle (every 1 msec): Ge, Gi, Act

The leabra.Network Cycle method in leabra/network.go looks like this:

// Cycle runs one cycle of activation updating:
// * Sends Ge increments from sending to receiving layers
// * Average and Max Ge stats
// * Inhibition based on Ge stats and Act Stats (computed at end of Cycle)
// * Activation from Ge, Gi, and Gl
// * Average and Max Act stats
// This basic version doesn't use the time info, but more specialized types do, and we
// want to keep a consistent API for end-user code.
func (nt *Network) Cycle(ltime *Time) {
	nt.SendGDelta(ltime) // also does integ
	nt.AvgMaxGe(ltime)
	nt.InhibFmGeAct(ltime)
	nt.ActFmG(ltime)
	nt.AvgMaxAct(ltime)
}

For every cycle of activation updating, we compute the excitatory input conductance Ge, then compute inhibition Gi based on average Ge and Act (from previous cycle), then compute the Act based on those conductances. The equations below are not shown in computational order but rather conceptual order for greater clarity. All of the relevant parameters are in the leabra.Layer.Act and Inhib fields, which are of type ActParams and InhibParams -- in this Go version, the parameters have been organized functionally, not structurally, into three categories.

Learning

XCAL DWt Function

Learning is based on running-averages of activation variables, parameterized in the leabra.Layer.Learn LearnParams field, mostly implemented in the leabra/learn.go file.

Input Scaling

The Ge and Gi synaptic conductances computed from a given projection from one layer to the next reflect the number of receptors currently open and capable of passing current, which is a function of the activity of the sending layer, and total number of synapses. We use a set of equations to automatically normalize (rescale) these factors across different projections, so that each projection has roughly an equal influence on the receiving neuron, by default.

The most important factor to be mindful of for this automatic rescaling process is the expected activity level in a given sending layer. This is set initially to Layer.Inhib.ActAvg.Init, and adapted from there by the various other parameters in that Inhib.ActAvg struct. It is a good idea in general to set that Init value to a reasonable estimate of the proportion of activity you expect in the layer, and in very small networks, it is typically much better to just set the Fixed flag and keep this Init value as such, as otherwise the automatically computed averages can fluctuate significantly and thus create corresponding changes in input scaling. The default UseFirst flag tries to avoid the dependence on the Init values but sometimes the first value may not be very representative, so it is better to set Init and turn off UseFirst for more reliable performance.

Furthermore, we add two tunable parameters that further scale the overall conductance received from a given projection (one in a relative way compared to other projections, and the other a simple absolute multiplicative scaling factor). These are some of the most important parameters to configure in the model -- in particular the strength of top-down "back" projections typically must be relatively weak compared to bottom-up forward projections (e.g., a relative scaling factor of 0.1 or 0.2 relative to the forward projections).

The scaling contributions of these two factors are:

Thus, all the Rel factors contribute in proportion to their relative value compared to the sum of all such factors across all receiving projections into a layer, while Abs just multiplies directly.

In general, you want to adjust the Rel factors, to keep the total Ge and Gi levels relatively constant, while just shifting the relative contributions. In the relatively rare case where the overall Ge levels are too high or too low, you should adjust the Abs values to compensate.

Typically the Ge value should be between .5 and 1, to maintain a reasonably responsive neural response, and avoid numerical integration instabilities and saturation that can arise if the values get too high. You can record the Layer.Pools[0].Inhib.Ge.Avg and .Max values at the epoch level to see how these are looking -- this is especially important in large networks, and those with unusual, complex patterns of connectivity, where things might get out of whack.

Automatic Rescaling

Here are the relevant factors that are used to compute the automatic rescaling to take into account the expected activity level on the sending layer, and the number of connections in the projection. The actual code is in leabra/layer.go: GScaleFmAvgAct() and leabra/act.go SLayActScale

This sc factor multiplies the GScale factor as computed above.