Awesome

TorchNTK

An Arbitrary** PyTorch Architecture Neural Tangent Kernel Library

This code was developed to bridge a gap in NTK computation before the release Pytorch1.11; but now with Pytorch 1.11 release I advise you take a look at functorch's NTK page, which generally will have better development + improvements than this repo. In other words, we do not expect to support this repo moving forward.

Installation

git clone this repository

git clone https://github.com/pnnl/torchntk

Add to PYTHONPATH

export PYTHONPATH="${PYTHONPATH}:/my/path/TorchNTK/"

Make sure you have correct dependencies installed; Broadly, this code was tested with PyTorch 1.9, numba 0.53.1, and Tensorboard 2.6.0, on Python 3.8.8.
The torch.vmap function is only available on nightly releases of PyTorch. torch.vmap is only used for one implementation of an autograd calculation-- it is not required
For the notebooks comparing to neural tangents, you will also need jax, jaxlib, and neural-tangents installed. This can be tricky for windows users, and we suggest going to the original neural-tangents page for detailed installation instructions here
For the tensorboard.ipynb notebook, download the dataset from here and place into ./DATA/ ; though you very well could use any other dataset or simulated data.

Basic Usage


import torchntk
import torch

DEVICE = 'cpu' #or cuda, lets say

model = Pytorch_Model() #Any architecture-- BUT must terminate in single neuron
model.to(DEVICE)

Y = model(X) 

NTK_components = torchntk.autograd.autograd_components_ntk(model,Y)

or, a generally faster implementation exists if torch.vmap exists (currently available in pytorch nightly builds only)


import torchntk
import torch
from torch.utils.data import DataLoader, TensorDataset

DEVICE = 'cuda' #

model = Pytorch_Model() #Any architecture-- BUT must terminate in single neuron
model.to(DEVICE)

xloader = DataLoader(TensorDataset(My_data,My_targets),batch_size=64, shuffle=False)

NTK_components = torchntk.autograd.vmap_ntk_loader(model,xloader)

Finally, if you are using a fully connected network (a network composed only of torch.nn.Linear layers) you can use this last method which is typically much faster:

import torchntk
import torch

DEVICE = 'cuda'

def activation(X):
    return torch.tanh(X)
	
def d_activation(X):
    return torch.cosh(X)**-2

class MLP(torch.nn.Module):
    def __init__(self,):
        super(MLP, self).__init__()
        self.d1 = torch.nn.Linear(784,100,bias=True) 
        self.d2 = torch.nn.Linear(100,100,bias=True)
        self.d3 = torch.nn.Linear(100,1,bias=True) 
    def forward(self, x_0):
        x_1 = activation(self.d1(x_0)) / torch.sqrt(100)
        x_2 = activation(self.d2(x_1)) / torch.sqrt(100)
        x_3 = activation(self.d3(x_2)) / torch.sqrt(1)
        return x_3, x_2, x_1, x_0 


model = MLP()
model.to(DEVICE)

x_3, x_2, x_1, x_0 = model(X) #for some data, X

Xs = [x_0.T.detach(),
      x_1.T.detach(),
	  x_2.T.detach()]
	  
layers = [model.d1,
          model.d2,
		  model.d3]
		  
#this must match the layer's width
ds_int = [100, 100, 1]

#this must match what you divided the layer by, squared.
#i.e., if you didn't divide each layer by anything, this should be all ones.
ds_float = [100.0, 100.0, 1.0]


config = {'Xs':Xs,
          'layers':layers,
		  'ds_int':ds_int,
		  'ds_float':ds_float,
		  'dactivation_t':d_activation}
 
components = torchntk.explicit.explicit_ntk(**config)
#components is a list of torch.Tensor objects representing each component of
#the NTK from each parameterized operation in reverse order. Meaning, 
#components[0] is the outermost layer weight matrix NTK component, 
#components[1] is the outermost layer bias vector NTK component,
# ...
#components[-1] is the first layer's bias vector NTK components 

#to get the full NTK, simply sum the components across the list's dimension.

Logging with Tensorboard

check the tensorboard.ipynb notebook.

Once installed, Tensorboard can be started on the command line with:

tensorboard --logdir=LOGDIR

Possible Metrics of Interest

The condition number is the (minimum eigenvalue of the NTK / maximum eigenvalue of the NTK). It is negatively correlated with model performance

Credit

"torchntk.autograd.old_autograd_ntk" was directly adatapted from the TENAS group's code, available here , and you can view their paper on neural architecture seach here; authored by Chen, Wuyang and Gong, Xinyu and Wang, Zhangyang and titled: "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective"

Some backward propogation functions were originally copied then heavily modified from this article by Pierre Jaumier, available here

I've also included some utility functions that I directly copied from the PyTorch source; therefore, their license clause is included in ours.

Experimental autograd operations were adapted from web pages in the pre-release of Pytorch1.11; but now with Pytorch 1.11 release I advise you take a look at functorch's NTK page.

Software TODO (or how you can contribute)

Add explicit calculations for more varied architectures
Parallelize computation across multiple GPUs
make the notebook that demonstrates the different algorithms into a test such that pytest can be run on it, assert all outputs are ~same