Awesome

from IPython.display import Image

Image(filename = 'HNN.jpeg')

jpeg

Hacking Neural Networks

Author: Kevin Thomas

Welcome to the Hacking Neural Networks notebook! This notebook is designed to provide a detailed, step-by-step walkthrough of the inner workings of a simple neural network. The goal is to demystify the calculations behind neural networks by breaking them down into understandable components, including forward propagation, backpropagation, gradient calculations, and parameter updates.

Overview

The focus of this repository is to make neural networks accessible and hackable. We emphasize hands-on exploration of every step in the network's operations, highlighting the following:

STEP 1: FORWARD PASS: How the network computes outputs.
STEP 2: BACK PROPAGATION: Deriving gradients for each parameter.
STEP 3: OPTIMIZATION / GRADIENT DESCENT: Updating weights and biases using gradient descent.

Network Architecture

The neural network we analyze consists of the following components:

Inputs: Two input neurons: $x_1$ and $x_2$.
Weights: $w_1$ and $w_2$, which connect the inputs to the output neuron.
Bias: $b$, an additive term to the linear combination.
Activation Function: $\tanh$, which introduces non-linearity.
Output: $o$, the result of the activation function.

Initial Setup

Image(filename = 'ae0.jpg')

jpeg

Install Libraries

!pip install torch

Requirement already satisfied: torch in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (2.5.1)
Requirement already satisfied: filelock in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (4.12.2)
Requirement already satisfied: networkx in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.3)
Requirement already satisfied: jinja2 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.1.4)
Requirement already satisfied: fsspec in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (2024.6.1)
Requirement already satisfied: setuptools in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (75.1.0)
Requirement already satisfied: sympy==1.13.1 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from sympy==1.13.1->torch) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from jinja2->torch) (2.1.3)

Imports

import torch

Set Random Seed / Reproducibility

torch.manual_seed(42)

<torch._C.Generator at 0x10ff09170>

Define Input `x1` w/ Gradients Enabled

x1 = torch.tensor([2.0], dtype=torch.double, requires_grad=True)

x1

tensor([2.], dtype=torch.float64, requires_grad=True)

x1.ndim

x1.shape

torch.Size([1])

Define Input `x2` w/ Gradients Enabled

x2 = torch.tensor([0.0], dtype=torch.double, requires_grad=True)

x2

tensor([0.], dtype=torch.float64, requires_grad=True)

x2.ndim

x2.shape

torch.Size([1])

Define Initial Weight `w1` w/ Gradients Enabled

w1 = torch.tensor([-3.0], dtype=torch.double, requires_grad=True)

w1

tensor([-3.], dtype=torch.float64, requires_grad=True)

w1.ndim

w1.shape

torch.Size([1])

Define Initial Weight `w2` w/ Gradients Enabled

w2 = torch.tensor([1.0], dtype=torch.double, requires_grad=True)

w2

tensor([1.], dtype=torch.float64, requires_grad=True)

w2.ndim

w2.shape

torch.Size([1])

Define Initial Bias `b` w/ Gradients Enabled

b = torch.tensor([6.8814], dtype=torch.double, requires_grad=True)

tensor([6.8814], dtype=torch.float64, requires_grad=True)

b.ndim

b.shape

torch.Size([1])

Define Learning Rate

learning_rate = 0.01  # step size for gradient descent

Define Number of Epochs

epochs = 10  # number of iterations

Define Target Value

target = torch.tensor([0.0], dtype=torch.double)  # desired output value

target

tensor([0.], dtype=torch.float64)

target.ndim

target.shape

torch.Size([1])

Training Loop

# iterate through 10 epochs
for epoch in range(epochs):
    
    # display the corresponding network diagram image
    display(Image(filename=f"ae{epoch}.jpg"))

    # forward pass: calculate n (linear combination) and o (output using tanh)
    n = x1 * w1 + x2 * w2 + b  # n = x1*w1 + x2*w2 + b
    o = torch.tanh(n)  # o = tanh(n)
    
    # calculate the loss using Mean Squared Error
    loss = 0.5 * (o - target).pow(2)  # loss = (1/2) * (o - target)^2

    # perform backward propagation to calculate gradients
    loss.backward()  # compute gradients for w1, w2, and b
    
    # log the current values and gradients
    print(f"EPOCH {epoch + 1}")
    if epoch + 1 == 10:
        print("--------")
    else:
        print("-------")
    print("STEP 0: INITIAL VALUES")
    print(f"  initial values:")
    print(f"    x1 = {x1.item()}, x2 = {x2.item()}")
    print(f"    w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}")
    print("STEP 1: FORWARD PASS")
    print(f"  forward pass:")
    print(f"    n = {n.item():.6f} (n = {x1.item()}*{w1.item()} + {x2.item()}*{w2.item()} + {b.item()})")
    print(f"    o = {o.item():.6f} (o = tanh({n.item()}))")
    print(f"    loss = {loss.item():.6f} (Loss = 0.5 * ({o.item()} - {target.item()})^2)")
    print("STEP 2: BACK PROPAGATION")
    print(f"  gradients (calculated during backward pass):")
    print(f"    w1.grad = {w1.grad.item():.6f} (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))")
    print(f"    w2.grad = {w2.grad.item():.6f} (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))")
    print(f"    b.grad = {b.grad.item():.6f} (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))")
    
    # update weights and bias using gradient descent
    with torch.no_grad():

        print("STEP 3: OPTIMIZATION / GRADIENT DESCENT")
        # explicitly show gradient descent calculation
        w1 -= learning_rate * w1.grad  # update w1
        print(f"    updated w1 = {w1.item():.6f} (w1 = w1 - lr * w1.grad = {w1.item()} - {learning_rate} * {w1.grad.item()})")
        w2 -= learning_rate * w2.grad  # update w2
        print(f"    updated w2 = {w2.item():.6f} (w2 = w2 - lr * w2.grad = {w2.item()} - {learning_rate} * {w2.grad.item()})")
        b -= learning_rate * b.grad    # update b
        print(f"    updated b = {b.item():.6f} (b = b - lr * b.grad = {b.item()} - {learning_rate} * {b.grad.item()})")

        # zero gradients for the next iteration
        w1.grad.zero_()
        w2.grad.zero_()
        b.grad.zero_()
        x1.grad.zero_()
        x2.grad.zero_()
    
    # log the updated weights and bias
    print(f"  updated parameters:")
    print(f"    w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}\n")

jpeg

EPOCH 1
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.000000, w2 = 1.000000, b = 6.881400
STEP 1: FORWARD PASS
  forward pass:
    n = 0.881400 (n = 2.0*-3.0 + 0.0*1.0 + 6.8814)
    o = 0.707120 (o = tanh(0.8814000000000002))
    loss = 0.250009 (Loss = 0.5 * (0.7071199874301227 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.707094 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.353547 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.007071 (w1 = w1 - lr * w1.grad = -3.00707093574203 - 0.01 * 0.7070935742030305)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.877865 (b = b - lr * b.grad = 6.877864532128985 - 0.01 * 0.35354678710151527)
  updated parameters:
    w1 = -3.007071, w2 = 1.000000, b = 6.877865

jpeg

EPOCH 2
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.007071, w2 = 1.000000, b = 6.877865
STEP 1: FORWARD PASS
  forward pass:
    n = 0.863723 (n = 2.0*-3.00707093574203 + 0.0*1.0 + 6.877864532128985)
    o = 0.698171 (o = tanh(0.8637226606449246))
    loss = 0.243721 (Loss = 0.5 * (0.6981707141514888 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.715705 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.357853 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.014228 (w1 = w1 - lr * w1.grad = -3.0142279906073903 - 0.01 * 0.7157054865360251)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.874286 (b = b - lr * b.grad = 6.874286004696305 - 0.01 * 0.35785274326801253)
  updated parameters:
    w1 = -3.014228, w2 = 1.000000, b = 6.874286

jpeg

EPOCH 3
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.014228, w2 = 1.000000, b = 6.874286
STEP 1: FORWARD PASS
  forward pass:
    n = 0.845830 (n = 2.0*-3.0142279906073903 + 0.0*1.0 + 6.874286004696305)
    o = 0.688885 (o = tanh(0.8458300234815246))
    loss = 0.237281 (Loss = 0.5 * (0.6888846949435228 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.723932 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.361966 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.021467 (w1 = w1 - lr * w1.grad = -3.0214673128405685 - 0.01 * 0.7239322233178187)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.870666 (b = b - lr * b.grad = 6.870666343579716 - 0.01 * 0.36196611165890935)
  updated parameters:
    w1 = -3.021467, w2 = 1.000000, b = 6.870666

jpeg

EPOCH 4
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.021467, w2 = 1.000000, b = 6.870666
STEP 1: FORWARD PASS
  forward pass:
    n = 0.827732 (n = 2.0*-3.0214673128405685 + 0.0*1.0 + 6.870666343579716)
    o = 0.679256 (o = tanh(0.8277317178985788))
    loss = 0.230694 (Loss = 0.5 * (0.6792561658373667 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.731710 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.365855 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.028784 (w1 = w1 - lr * w1.grad = -3.0287844105263533 - 0.01 * 0.7317097685784673)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.867008 (b = b - lr * b.grad = 6.867007794736823 - 0.01 * 0.36585488428923363)
  updated parameters:
    w1 = -3.028784, w2 = 1.000000, b = 6.867008

jpeg

EPOCH 5
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.028784, w2 = 1.000000, b = 6.867008
STEP 1: FORWARD PASS
  forward pass:
    n = 0.809439 (n = 2.0*-3.0287844105263533 + 0.0*1.0 + 6.867007794736823)
    o = 0.669281 (o = tanh(0.8094389736841165))
    loss = 0.223968 (Loss = 0.5 * (0.6692806538042507 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.738971 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.369485 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.036174 (w1 = w1 - lr * w1.grad = -3.0361741176784696 - 0.01 * 0.7389707152116205)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.863313 (b = b - lr * b.grad = 6.863312941160765 - 0.01 * 0.36948535760581025)
  updated parameters:
    w1 = -3.036174, w2 = 1.000000, b = 6.863313

jpeg

EPOCH 6
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.036174, w2 = 1.000000, b = 6.863313
STEP 1: FORWARD PASS
  forward pass:
    n = 0.790965 (n = 2.0*-3.0361741176784696 + 0.0*1.0 + 6.863312941160765)
    o = 0.658955 (o = tanh(0.7909647058038258))
    loss = 0.217111 (Loss = 0.5 * (0.6589551923491251 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.745645 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.372822 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.043631 (w1 = w1 - lr * w1.grad = -3.0436305654127542 - 0.01 * 0.7456447734284608)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.859585 (b = b - lr * b.grad = 6.859584717293623 - 0.01 * 0.3728223867142304)
  updated parameters:
    w1 = -3.043631, w2 = 1.000000, b = 6.859585

jpeg

EPOCH 7
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.043631, w2 = 1.000000, b = 6.859585
STEP 1: FORWARD PASS
  forward pass:
    n = 0.772324 (n = 2.0*-3.0436305654127542 + 0.0*1.0 + 6.859584717293623)
    o = 0.648279 (o = tanh(0.7723235864681142))
    loss = 0.210133 (Loss = 0.5 * (0.6482785444319854 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.751659 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.375830 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.051147 (w1 = w1 - lr * w1.grad = -3.051147159729109 - 0.01 * 0.7516594316354793)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.855826 (b = b - lr * b.grad = 6.855826420135445 - 0.01 * 0.37582971581773966)
  updated parameters:
    w1 = -3.051147, w2 = 1.000000, b = 6.855826

jpeg

EPOCH 8
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.051147, w2 = 1.000000, b = 6.855826
STEP 1: FORWARD PASS
  forward pass:
    n = 0.753532 (n = 2.0*-3.051147159729109 + 0.0*1.0 + 6.855826420135445)
    o = 0.637251 (o = tanh(0.7535321006772273))
    loss = 0.203045 (Loss = 0.5 * (0.6372514280663818 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.756941 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.378470 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.058717 (w1 = w1 - lr * w1.grad = -3.0587165675110963 - 0.01 * 0.7569407781987396)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.852042 (b = b - lr * b.grad = 6.852041716244451 - 0.01 * 0.3784703890993698)
  updated parameters:
    w1 = -3.058717, w2 = 1.000000, b = 6.852042

jpeg

EPOCH 9
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.058717, w2 = 1.000000, b = 6.852042
STEP 1: FORWARD PASS
  forward pass:
    n = 0.734609 (n = 2.0*-3.0587165675110963 + 0.0*1.0 + 6.852041716244451)
    o = 0.625877 (o = tanh(0.7346085812222585))
    loss = 0.195861 (Loss = 0.5 * (0.6258767388376627 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.761414 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.380707 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.066331 (w1 = w1 - lr * w1.grad = -3.0663307123827015 - 0.01 * 0.7614144871604955)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.848235 (b = b - lr * b.grad = 6.848234643808649 - 0.01 * 0.38070724358024777)
  updated parameters:
    w1 = -3.066331, w2 = 1.000000, b = 6.848235

jpeg

EPOCH 10
--------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.066331, w2 = 1.000000, b = 6.848235
STEP 1: FORWARD PASS
  forward pass:
    n = 0.715573 (n = 2.0*-3.0663307123827015 + 0.0*1.0 + 6.848234643808649)
    o = 0.614160 (o = tanh(0.7155732190432458))
    loss = 0.188596 (Loss = 0.5 * (0.6141597625050071 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.765007 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.382503 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.073981 (w1 = w1 - lr * w1.grad = -3.0739807820228937 - 0.01 * 0.765006964019203)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.844410 (b = b - lr * b.grad = 6.844409608988553 - 0.01 * 0.3825034820096015)
  updated parameters:
    w1 = -3.073981, w2 = 1.000000, b = 6.844410

Final Output

# display the corresponding network diagram image
display(Image(filename=f"ae10.jpg"))

# final Output
print("\nFinal Output:")
print(f"  Final n = {n.item():.6f}")
print(f"  Final o = {o.item():.6f} (o = tanh({n.item()}))")
print(f"  Final Loss = {loss.item():.6f}")
print(f"Final Parameters:")
print(f"  w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}")

jpeg

Final Output:
  Final n = 0.715573
  Final o = 0.614160 (o = tanh(0.7155732190432458))
  Final Loss = 0.188596
Final Parameters:
  w1 = -3.073981, w2 = 1.000000, b = 6.844410

Awesome

Hacking Neural Networks

Author: Kevin Thomas

Overview

Network Architecture

Initial Setup

Install Libraries

Imports

Set Random Seed / Reproducibility

Define Input x1 w/ Gradients Enabled

Define Input x2 w/ Gradients Enabled

Define Initial Weight w1 w/ Gradients Enabled

Define Initial Weight w2 w/ Gradients Enabled

Define Initial Bias b w/ Gradients Enabled

Define Learning Rate

Define Number of Epochs

Define Target Value

Training Loop

Final Output

Define Input `x1` w/ Gradients Enabled

Define Input `x2` w/ Gradients Enabled

Define Initial Weight `w1` w/ Gradients Enabled

Define Initial Weight `w2` w/ Gradients Enabled

Define Initial Bias `b` w/ Gradients Enabled