Awesome

DAU-ConvNet

Official implementation of Displaced Aggregation Units for Convolutional Networks from CVPR 2018 paper titled "Spatially-Adaptive Filter Units for Deep Neural Networks" that was developed as part of Deep Compositional Networks.

This repository is a self-contained DAU layer implementation in C++ and CUDA, plus a TensorFlow plugin. Use this library to implement DAU layers for any deep learning framework. For more details on DAUs see ViCoS research page.

Available implementations :

TensorFlow
Caffe

See below for more details on each implementation.

Citation

Please cite our CVPR 2018 paper when using DAU code:

@inproceedings{Tabernik2018,
	title = {{Spatially-Adaptive Filter Units for Deep Neural Networks}},
	author = {Tabernik, Domen and Kristan, Matej and Leonardis, Ale{\v{s}}},
	booktitle = {2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
	year = {2018}
	pages = {9388--9396}
}

Acknowledgment

We thank Vitjan Zavrtanik (VitjanZ) for TensorFlow C++/Python wrapper.

Caffe

A Caffe implementation based on this library is available in DAU-ConvNet-caffe repository.

Pretrained models for Caffe from CVPR 2018 papers are available:

AlexNet-DAU-ConvNet (default) (56.9% top-1 accuracy, 0.7 mio DAU units)
AlexNet-DAU-ConvNet-small (56.4% top-1 accuracy, 0.3 mio DAU units)
AlexNet-DAU-ConvNet-large (57.3% top-1 accuracy, 1.5 mio DAU units)

TensorFlow

We provide TensorFlow plugin and appropriate Python wrappers that can be used to directly replace the tf.contrib.layers.conv2d function. Note, our C++/CUDA code natively supports only NCHW format for input, please update your TensorFlow models to use this format.

Requirements and dependency libraries for TensorFlow plugin:

Python (tested on Python2.7 and Python3.5)
TensorFlow 1.6 or newer
Numpy
OpenBlas
(optional) Scipy, matplotlib and python-tk for running unit test in dau_conv_test.py

Instalation from pre-compiled binaries (pip)

If you are using TensorFlow from pip, then install a pre-compiled binaries (.whl) from the RELEASE page (mirror server also available http://box.vicos.si/skokec/dau-convnet):

# install dependency library (OpenBLAS)
sudo apt-get install libopenblas-dev  wget

# install dau-conv package
export TF_VERSION=1.13.1
sudo pip install https://github.com/skokec/DAU-ConvNet/releases/download/v1.0/dau_conv-1.0_TF[TF_VERSION]-cp35-cp35m-manylinux1_x86_64.whl

Note that pip packages were compiled against the specific version of TensorFlow from pip, which must be installed beforhand.

Docker

Pre-compiled docker images for TensorFlow are also available on Docker Hub that are build using the plugins/tensorflow/docker/Dockerfile.

Dockers are build for specific python and TensorFlow version. Start docker, for instance, for Python3.5 and TensorFlow r1.13.1, using:

sudo nvidia-docker run -i -d -t skokec/tf-dau-convnet:1.0-py3.5-tf1.13.1 /bin/bash

Build and installation

Requirements and dependency libraries to compile DAU-ConvNet:

Ubuntu 16.04 (not tested on other OS and other versions)
C++11
CMake 2.8 or newer (tested on version 3.5)
CUDA SDK Toolkit (tested on version 8.0 and 9.0)
BLAS (ATLAS or OpenBLAS)
cuBlas

On Ubuntu 16.04 with pre-installed CUDA and cuBLAS (e.g. using nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 or nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04 docker) install dependencies first:

apt-get update
apt-get install cmake python python-pip libopenblas-dev
 
pip install tensorflow-gpu>=1.6
# Note: during instalation tensorflow package is sufficent, but during running the tensorflow-gpu is required.

Then clone the repository and build from source:

git clone https://github.com/skokec/DAU-ConvNet
git submodule update --init --recursive

mkdir DAU-ConvNet/build
cd DAU-ConvNet/build

cmake -DBLAS=Open -DBUILD_TENSORFLOW_PLUGIN=on ..

make -j # creates whl file in build/plugin/tensorflow/wheelhouse
make install # will install whl package (with .so files) into python dist-packages folder

Unit test

To validate installation using unit tests also install scipy, matplotlib and python-tk, and then run dau_conv_test.py:

apt-get install python-tk                
                
pip install scipy matplotlib

python DAU-ConvNet/plugins/tensorflow/tests/dau_conv_test.py DAUConvTest.test_DAUConvQuick

Common issues

I got undefined symbol: _ZN9perftools8gputools4cuda17AsCUDAStreamValueEPNS0_6StreamE when running the code.

Please make sure that your TensorFlow is compiled against GPU/CUDA. In pip the tensroflow and tensorflow-gpu packages provide the same libtensorflow_framework.so in the same folder but only tensorflow-gpu has the .so that is compiled against the CUDA. If tensroflow gets installed after the tensorflow-gpu then .so with CUDA support will be overriden by the .so without it. Make sure to install tensorflow-gpu the last or not to install tensroflow at all.

Usage

There are two available methods to use our DAU convolution. Using dau_conv.DAUConv2d class based on base.Layer or using wrapper dau_conv.dau_conv2d functions. See below for example on using dau_conv2d method.

Method dau_conv.dau_conv2d:

dau_conv2d(inputs,
             filters, # number of output filters
             dau_units, # number of DAU units per image axis, e.g, (2,2) for 4 DAUs per filter 
             max_kernel_size, # maximal possible size of kernel that limits the offset of DAUs (highest value that can be used=17)  
             stride=1, # only stride=1 supported 
             mu_learning_rate_factor=500, # additional factor for gradients of mu1 and mu2
             data_format=None,
             activation_fn=tf.nn.relu,
             normalizer_fn=None,
             normalizer_params=None,
             weights_initializer=tf.random_normal_initializer(stddev=0.1), 
             weights_regularizer=None,
             mu1_initializer=None, # see below for default initialization values
             mu1_regularizer=None, # see below for default initialization values
             mu2_initializer=None,
             mu2_regularizer=None,
             sigma_initializer=None,
             sigma_regularizer=None,
             biases_initializer=tf.zeros_initializer(),
             biases_regularizer=None,
             reuse=None,
             variables_collections=None,
             outputs_collections=None,
             trainable=True,
             scope=None)

Class dau_conv.DAUConv2d:


DAUConv2d(filters, # number of output filters
           dau_units, # number of DAU units per image axis, e.g, (2,2) for 4 DAUs total per one filter
           max_kernel_size, # maximal possible size of kernel that limits the offset of DAUs (highest value that can be used=17)
           strides=1, # only stride=1 supported
           data_format='channels_first', # supports only 'channels_last' 
           activation=None,
           use_bias=True,
           weight_initializer=tf.random_normal_initializer(stddev=0.1),
           mu1_initializer=None, # see below for default initialization values
           mu2_initializer=None, # see below for default initialization values
           sigma_initializer=None,
           bias_initializer=tf.zeros_initializer(),
           weight_regularizer=None,
           mu1_regularizer=None,
           mu2_regularizer=None,
           sigma_regularizer=None,
           bias_regularizer=None,
           activity_regularizer=None,
           weight_constraint=None,
           mu1_constraint=None,
           mu2_constraint=None,
           sigma_constraint=None,
           bias_constraint=None,
           trainable=True,
           mu_learning_rate_factor=500, # additional factor for gradients of mu1 and mu2 
           unit_testing=False, # for competability between CPU and GPU version (where gradients of last edge need to be ignored) during unit testing
           name=None)

Mean initialization (mu1 and mu2 initializers)

Mean values (e.g. learned offsets) of DAU units are always based on (0,0) being at the center of the kernel. Default initialization (when passing None) is to arrange units equally over the available space using dau_conv.DAUGridMean initializer class:

    if self.mu1_initializer is None:
        self.mu1_initializer = DAUGridMean(dau_units=self.dau_units, max_value=np.floor(self.max_kernel_size[1]/2.0)-1, dau_unit_axis=2)
        
    if self.mu2_initializer is None:
        self.mu2_initializer = DAUGridMean(dau_units=self.dau_units, max_value=np.floor(self.max_kernel_size[0]/2.0)-1, dau_unit_axis=1)

Other TensorFlow initializer classes can be used. For instance distributing them uniformly over the center of the kernel is accomplished by:

dau_conv2d(...
          mu1_initializer = tf.random_uniform_initializer(minval=-np.floor(max_kernel_size/2.0), 
                                                          maxval=np.floor(max_kernel_size/2.0),dtype=tf.float32),
          mu2_initializer = tf.random_uniform_initializer(minval=-np.floor(max_kernel_size/2.0), 
                                                          maxval=np.floor(max_kernel_size/2.0),dtype=tf.float32), 
          ...)

Initializer dau_conv.DAUGridMean class:

dau_conv.DAUGridMean(dau_units, # number of DAU units per image axis e.g. (2,2) for 4 DAUs total 
                     max_value, # max offset 
                     dau_unit_axis=2) # axis for DAU units in input tensor where 2 => mu1, 1 => mu2, (default=2)

Limtations and restrictions

Current implementation is limited to using only the following settings:

data_format = 'NCHW': only 'NCHW' format available in our C++/CUDA implementation
number of output channels must be at least a multiple of 16 or 32 (depending on batch size)
stride = 1: striding not implemented yet
max_kernel_size <= 65: due to pre-defined CUDA kernels max offsets are restricted to specific values:
- max_kernel_size <= 9 and max_kernel_size <= 17: most optimal kernel implementations
- max_kernel_size <= 33 and max_kernel_size <= 65: less optimal implementation that have additional computational penalty due to larger memory utilization
- NOTE: selection of which CUDA kernel is used is performed based on actual offset values at each call so even setting large kernel sizes can be fast if all offset values (in each layer) are smaller than 8 pixels.

Example of code usage

CIFAR-10 example is available here.

Example of three DAU convolutional layer and one fully connected using batch norm and L2 regularization on weights:

import tensorflow as tf

from tensorflow.contrib.framework import arg_scope

from dau_conv import dau_conv2d

with arg_scope([dau_conv2d, tf.contrib.layers.fully_connected],
                weights_regularizer=tf.contrib.layers.l2_regularizer(0.0005),
                weights_initializer=tf.contrib.layers.xavier_initializer(uniform=False),
                biases_initializer=None,
                normalizer_fn=tf.layers.batch_normalization,
                normalizer_params=dict(center=True,
                                       scale=True,
                                       momentum=0.9999, 
                                       epsilon=0.001, 
                                       axis=1, # NOTE: use axis=1 for NCHW format !!
                                       training=in_training)):
            
            inputs = ...
            
            # convert from NHWC to NCHW format
            inputs = tf.transpose(inputs, [0,3,1,2])
            
            net = dau_conv2d(inputs, 96, dau_units=(2,2), max_kernel_size=9,
                                    mu_learning_rate_factor=500, data_format='NCHW', scope='dau_conv1')
            net = tf.contrib.layers.max_pool2d(net, [2, 2], scope='pool1', data_format="NCHW")

            net = dau_conv2d(net, 96, dau_units=(2,2), max_kernel_size=9,
                                    mu_learning_rate_factor=500, data_format='NCHW', scope='dau_conv2')
            net = tf.contrib.layers(net, [2, 2], scope='pool2', data_format="NCHW")

            net = dau_conv2d(net, 192, dau_units=(2,2), max_kernel_size=9,
                                    mu_learning_rate_factor=500, data_format='NCHW', scope='dau_conv3')
            net = tf.contrib.layers.max_pool2d(net, [2, 2], scope='pool3', data_format="NCHW")
            net = tf.reshape(net, [net.shape[0], -1])

            net = tf.contrib.layers.fully_connected(net, NUM_CLASSES, scope='fc4',
                                                    activation_fn=None,
                                                    normalizer_fn=None,
                                                    biases_initializer=tf.constant_initializer(0))