Home

Awesome

Getting Started

# acquire source code and compile
git clone https://github.com/attractivechaos/kann
cd kann; make  # or "make CBLAS=/path/to/openblas" for faster matrix multiplication
# learn unsigned addition (30000 samples; numbers within 10000)
seq 30000 | awk -v m=10000 '{a=int(m*rand());b=int(m*rand());print a,b,a+b}' \
  | ./examples/rnn-bit -m7 -o add.kan -
# apply the model (output 1138429, the sum of the two numbers)
echo 400958 737471 | ./examples/rnn-bit -Ai add.kan -

Introduction

KANN is a standalone and lightweight library in C for constructing and training small to medium artificial neural networks such as multi-layer perceptrons, convolutional neural networks and recurrent neural networks (including LSTM and GRU). It implements graph-based reverse-mode automatic differentiation and allows to build topologically complex neural networks with recurrence, shared weights and multiple inputs/outputs/costs. In comparison to mainstream deep learning frameworks such as TensorFlow, KANN is not as scalable, but it is close in flexibility, has a much smaller code base and only depends on the standard C library. In comparison to other lightweight frameworks such as tiny-dnn, KANN is still smaller, times faster and much more versatile, supporting RNN, VAE and non-standard neural networks that may fail these lightweight frameworks.

KANN could be potentially useful when you want to experiment small to medium neural networks in C/C++, to deploy no-so-large models without worrying about dependency hell, or to learn the internals of deep learning libraries.

Features

Limitations

Installation

The KANN library is composed of four files: kautodiff.{h,c} and kann.{h,c}. You are encouraged to include these files in your source code tree. No installation is needed. To compile examples:

make

This generates a few executables in the examples directory.

Documentations

Comments in the header files briefly explain the APIs. More documentations can be found in the doc directory. Examples using the library are in the examples directory.

A tour of basic KANN APIs

Working with neural networks usually involves three steps: model construction, training and prediction. We can use layer APIs to build a simple model:

kann_t *ann;
kad_node_t *t;
t = kann_layer_input(784); // for MNIST
t = kad_relu(kann_layer_dense(t, 64)); // a 64-neuron hidden layer with ReLU activation
t = kann_layer_cost(t, 10, KANN_C_CEM); // softmax output + multi-class cross-entropy cost
ann = kann_new(t, 0);                   // compile the network and collate variables

For this simple feedforward model with one input and one output, we can train it with:

int n;     // number of training samples
float **x; // model input, of size n * 784
float **y; // model output, of size n * 10
// fill in x and y here and then call:
kann_train_fnn1(ann, 0.001f, 64, 25, 10, 0.1f, n, x, y);

We can save the model to a file with kann_save() or use it to classify a MNIST image:

float *x;       // of size 784
const float *y; // this will point to an array of size 10
// fill in x here and then call:
y = kann_apply1(ann, x);

Working with complex models requires to use low-level APIs. Please see 01user.md for details.

A complete example

This example learns to count the number of "1" bits in an integer (i.e. popcount):

// to compile and run: gcc -O2 this-prog.c kann.c kautodiff.c -lm && ./a.out
#include <stdlib.h>
#include <stdio.h>
#include "kann.h"

int main(void)
{
	int i, k, max_bit = 20, n_samples = 30000, mask = (1<<max_bit)-1, n_err, max_k;
	float **x, **y, max, *x1;
	kad_node_t *t;
	kann_t *ann;
	// construct an MLP with one hidden layers
	t = kann_layer_input(max_bit);
	t = kad_relu(kann_layer_dense(t, 64));
	t = kann_layer_cost(t, max_bit + 1, KANN_C_CEM); // output uses 1-hot encoding
	ann = kann_new(t, 0);
	// generate training data
	x = (float**)calloc(n_samples, sizeof(float*));
	y = (float**)calloc(n_samples, sizeof(float*));
	for (i = 0; i < n_samples; ++i) {
		int c, a = kad_rand(0) & (mask>>1);
		x[i] = (float*)calloc(max_bit, sizeof(float));
		y[i] = (float*)calloc(max_bit + 1, sizeof(float));
		for (k = c = 0; k < max_bit; ++k)
			x[i][k] = (float)(a>>k&1), c += (a>>k&1);
		y[i][c] = 1.0f; // c is ranged from 0 to max_bit inclusive
	}
	// train
	kann_train_fnn1(ann, 0.001f, 64, 50, 10, 0.1f, n_samples, x, y);
	// predict
	x1 = (float*)calloc(max_bit, sizeof(float));
	for (i = n_err = 0; i < n_samples; ++i) {
		int c, a = kad_rand(0) & (mask>>1); // generating a new number
		const float *y1;
		for (k = c = 0; k < max_bit; ++k)
			x1[k] = (float)(a>>k&1), c += (a>>k&1);
		y1 = kann_apply1(ann, x1);
		for (k = 0, max_k = -1, max = -1.0f; k <= max_bit; ++k) // find the max
			if (max < y1[k]) max = y1[k], max_k = k;
		if (max_k != c) ++n_err;
	}
	fprintf(stderr, "Test error rate: %.2f%%\n", 100.0 * n_err / n_samples);
	kann_delete(ann); // TODO: also to free x, y and x1
	return 0;
}

Benchmarks

TaskFrameworkMachineDeviceRealCPUCommand line
mnist-mlpKANN+SSELinux1 CPU31.3s31.2smlp -m20 -v0
Mac1 CPU27.1s27.1s
KANN+BLASLinux1 CPU18.8s18.8s
Theano+KerasLinux1 CPU33.7s33.2skeras/mlp.py -m20 -v0
4 CPUs32.0s121.3s
Mac1 CPU37.2s35.2s
2 CPUs32.9s62.0s
TensorFlowMac1 CPU33.4s33.4stensorflow/mlp.py -m20
2 CPUs29.2s50.6stensorflow/mlp.py -m20 -t2
Tiny-dnnLinux1 CPU2m19s2m18stiny-dnn/mlp -m20
Tiny-dnn+AVXLinux1 CPU1m34s1m33s
Mac1 CPU2m17s2m16s
mnist-cnnKANN+SSELinux1 CPU57m57s57m53smnist-cnn -v0 -m15
4 CPUs19m09s68m17smnist-cnn -v0 -t4 -m15
Theano+KerasLinux1 CPU37m12s37m09skeras/mlp.py -Cm15 -v0
4 CPUs24m24s97m22s
1 GPU2m57skeras/mlp.py -Cm15 -v0
Tiny-dnn+AVXLinux1 CPU300m40s300m23stiny-dnn/mlp -Cm15
mul100-rnnKANN+SSELinux1 CPU40m05s40m02srnn-bit -l2 -n160 -m25 -Nd0
4 CPUs12m13s44m40srnn-bit -l2 -n160 -t4 -m25 -Nd0
KANN+BLASLinux1 CPU22m58s22m56srnn-bit -l2 -n160 -m25 -Nd0
4 CPUs8m18s31m26srnn-bit -l2 -n160 -t4 -m25 -Nd0
Theano+KerasLinux1 CPU27m30s27m27srnn-bit.py -l2 -n160 -m25
4 CPUs19m52s77m45s