

Work In Progress version 2.0.

There are many breaking change as per RFC: https://github.com/apache/incubator-mxnet/issues/16167. With this change we are introducing NumPy-compatible coding experience into MXNet

Gitter Build Status NuGet

<div align="center"> <a href="https://mxnet.apache.org/"><img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet_logo_2.png"></a><br> </div>

Apache MXNet (incubating) for Deep Learning

Apache MXNet (incubating) is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.


MxNet.Sharp is a CSharp binding coving all the Imperative, Symbolic and Gluon API's with an easy to use interface. The Gluon library in Apache MXNet provides a clear, concise, and simple API for deep learning. It makes it easy to prototype, build, and train deep learning models without sacrificing training speed.

High Level Arch

High Level Arch

A New NumPy Interface for MxNet#

The MXNet community is pleased to announce a new NumPy interface for MXNet that allows developers to retain the familiar syntax of NumPy, while leveraging performance gains from accelerated computing on GPUs and asynchronous execution on CPUs and GPUs, in addition to automatic differentiation for differentiable NumPy ops through MxNet.Autograd.

The new NumPy interface from MXNet, MxNet.Numpy, is intended to be a drop-in replacement for NumPy, as such mxnet.numpy supports many familiar numpy.ndarray operations necessary for developing machine learning or deep learning models and operations are continually being added.

Work List

MxNet.Numpy Vs NumPy Performance

Lets consider simple test to see the performance difference. I will keep adding more scenarios and with GPU test as well.

Scenario 1

using MxNet;
using MxNet.Numpy;
using System;

namespace PerfTest
    class Program
        static void Main(string[] args)
            DateTime start = DateTime.Now;
            var x = np.random.uniform(size: new Shape(3000, 3000));
            var y = np.random.uniform(size: new Shape(3000, 3000));
            var d = np.dot(x, y);
            Console.WriteLine("Duration: " + (DateTime.Now - start).TotalMilliseconds / 1000);
import numpy as np
import time

start_time = time.time()
x = np.random.uniform(0, 1, (3000, 1000))
y = np.random.uniform(0, 1, (3000, 3000))
d = np.dot(x, y);
#d = 0.5 * np.sqrt(x) + np.sin(y) * np.log(x) - np.exp(y)
print("--- %s sec ---" % (time.time() - start_time))

Scenario 2

using MxNet;
using MxNet.Numpy;
using System;

namespace PerfTest
    class Program
        static void Main(string[] args)
            DateTime start = DateTime.Now;
            var x = np.random.uniform(size: new Shape(30000, 10000));
            var y = np.random.uniform(size: new Shape(30000, 10000));
            var d = 0.5f * np.sqrt(x) + np.sin(y) * np.log(x) - np.exp(y);
            Console.WriteLine("Duration: " + (DateTime.Now - start).TotalMilliseconds / 1000);
import numpy as np
import time

start_time = time.time()
x = np.random.uniform(0, 1, (30000, 10000))
y = np.random.uniform(0, 1, (30000, 10000))
d = 0.5 * np.sqrt(x) + np.sin(y) * np.log(x) - np.exp(y)
print("--- %s sec ---" % (time.time() - start_time))
ScenarioMxNet CPUNumPy


Install the package: Install-Package MxNet.Sharp


Add the MxNet redistributed package available as per below.

Important: Make sure your installed CUDA version matches the CUDA version in the nuget package.

Check your CUDA version with the following command:

nvcc --version

You can either upgrade your CUDA install or install the MXNet package that supports your CUDA version.

MxNet Version Build: https://github.com/apache/incubator-mxnet/releases/tag/1.5.0

Win-x64 Packages

MxNet-CPUMxNet CPU VersionInstall-Package MxNet.Runtime.Redist
MxNet-MKLMxNet CPU with MKLInstall-Package MxNet-MKL.Runtime.Redist
MxNet-CU101MxNet for Cuda 10.1 and CuDnn 7Install-Package MxNet-CU101.Runtime.Redist
MxNet-CU101MKLMxNet for Cuda 10.1 and CuDnn 7Install-Package MxNet-CU101MKL.Runtime.Redist
MxNet-CU100MxNet for Cuda 10 and CuDnn 7Install-Package MxNet-CU100.Runtime.Redist
MxNet-CU100MKLMxNet with MKL for Cuda 10 and CuDnn 7Install-Package MxNet-CU100MKL.Runtime.Redist
MxNet-CU92MxNet for Cuda 9.2 and CuDnn 7Install-Package MxNet-CU100.Runtime.Redist
MxNet-CU92MKLMxNet with MKL for Cuda 9.2 and CuDnn 7Install-Package MxNet-CU92MKL.Runtime.Redist
MxNet-CU80MxNet for Cuda 8.0 and CuDnn 7Install-Package MxNet-CU100.Runtime.Redist
MxNet-CU80MKLMxNet with MKL for Cuda 8.0 and CuDnn 7Install-Package MxNet-CU80MKL.Runtime.Redist

Linux-x64 Packages

MxNet-CPUMxNet CPU VersionInstall-Package MxNet.Linux.Runtime.Redist
MxNet-MKLMxNet CPU with MKLInstall-Package MxNet-MKL.Linux.Runtime.Redist
MxNet-CU101MxNet for Cuda 10.1 and CuDnn 7Yet to publish
MxNet-CU101MKLMxNet for Cuda 10.1 and CuDnn 7Yet to publish
MxNet-CU100MxNet for Cuda 10 and CuDnn 7Yet to publish
MxNet-CU100MKLMxNet with MKL for Cuda 10 and CuDnn 7Yet to publish
MxNet-CU92MxNet for Cuda 9.2 and CuDnn 7Yet to publish
MxNet-CU92MKLMxNet with MKL for Cuda 9.2 and CuDnn 7Yet to publish
MxNet-CU80MxNet for Cuda 8.0 and CuDnn 7Yet to publish
MxNet-CU80MKLMxNet with MKL for Cuda 8.0 and CuDnn 7Yet to publish

OSX-x64 Packages

MxNet-CPUMxNet CPU VersionYet to publish
MxNet-MKLMxNet CPU with MKLYet to publish
MxNet-CU101MxNet for Cuda 10.1 and CuDnn 7Yet to publish
MxNet-CU101MKLMxNet for Cuda 10.1 and CuDnn 7Yet to publish
MxNet-CU100MxNet for Cuda 10 and CuDnn 7Yet to publish
MxNet-CU100MKLMxNet with MKL for Cuda 10 and CuDnn 7Yet to publish
MxNet-CU92MxNet for Cuda 9.2 and CuDnn 7Yet to publish
MxNet-CU92MKLMxNet with MKL for Cuda 9.2 and CuDnn 7Yet to publish
MxNet-CU80MxNet for Cuda 8.0 and CuDnn 7Yet to publish
MxNet-CU80MKLMxNet with MKL for Cuda 8.0 and CuDnn 7Yet to publish

Gluon MNIST Example

Demo as per: https://mxnet.apache.org/api/python/docs/tutorials/packages/gluon/image/mnist.html

var mnist = TestUtils.GetMNIST(); //Get the MNIST dataset, it will download if not found
var batch_size = 200; //Set training batch size
var train_data = new NDArrayIter(mnist["train_data"], mnist["train_label"], batch_size, true);
var val_data = new NDArrayIter(mnist["test_data"], mnist["test_label"], batch_size);

// Define simple network with dense layers
var net = new Sequential();
net.Add(new Dense(128, ActivationType.Relu));
net.Add(new Dense(64, ActivationType.Relu));
net.Add(new Dense(10));

//Set context, multi-gpu supported
var gpus = TestUtils.ListGpus();
var ctx = gpus.Count > 0 ? gpus.Select(x => Context.Gpu(x)).ToArray() : new[] {Context.Cpu(0)};

//Initialize the weights
net.Initialize(new Xavier(magnitude: 2.24f), ctx);

//Create the trainer with all the network parameters and set the optimizer
var trainer = new Trainer(net.CollectParams(), new Adam());

var epoch = 10;
var metric = new Accuracy(); //Use Accuracy as the evaluation metric.
var softmax_cross_entropy_loss = new SoftmaxCELoss();
float lossVal = 0; //For loss calculation
for (var iter = 0; iter < epoch; iter++)
    var tic = DateTime.Now;
    // Reset the train data iterator.
    lossVal = 0;

    // Loop over the train data iterator.
    while (!train_data.End())
        var batch = train_data.Next();

        // Splits train data into multiple slices along batch_axis
        // and copy each slice into a context.
        var data = Utils.SplitAndLoad(batch.Data[0], ctx, batch_axis: 0);

        // Splits train labels into multiple slices along batch_axis
        // and copy each slice into a context.
        var label = Utils.SplitAndLoad(batch.Label[0], ctx, batch_axis: 0);

        var outputs = new NDArrayList();

        // Inside training scope
        using (var ag = Autograd.Record())
            outputs = Enumerable.Zip(data, label, (x, y) =>
                var z = net.Call(x);

                // Computes softmax cross entropy loss.
                NDArray loss = softmax_cross_entropy_loss.Call(z, y);

                // Backpropagate the error for one iteration.
                lossVal += loss.Mean();
                return z;

        // Updates internal evaluation
        metric.Update(label, outputs.ToArray());

        // Make one step of parameter update. Trainer needs to know the
        // batch size of data to normalize the gradient by 1/batch_size.

    var toc = DateTime.Now;

    // Gets the evaluation result.
    var (name, acc) = metric.Get();

    // Reset evaluation result to initial state.
    Console.Write($"Loss: {lossVal} ");
    Console.WriteLine($"Training acc at epoch {iter}: {name}={(acc * 100).ToString("0.##")}%, Duration: {(toc - tic).TotalSeconds.ToString("0.#")}s");

Reached accuracy of 98% within 6th epoch.

alt text

Documentation (In Progress)
