01_derivatives_and

Next: 02_nn_data_structs_and_forward_pass
Related: backprop-graph-terminology

# imports
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Micrograd

Micrograd is scalar-valued autograd engine which implement backpropagation. It operates with scalar values only, for education purposes. Bigger networks need tensors for parallelisation (n-dim arrays of scalars).

Recap: Backpropagation

Allows efficient evaluation of a loss function, $C_{θ}$ , wrt. the weights, $θ$ , of a neural network.

Iteratively tune the weights, $θ$ , to minimise the loss function, $C_{θ}$ , improving the network’s accuracy.

What is a derivative?

What information does it contain?

1. Define (and call) a single-value scalar function `f(x)`

def f(x):
    return 3*x**2 - 4*x + 5
 
# call function
f(3.0)

20.0

# plot function to understand its shape
xs = np.arange(-5, 5, 0.25) # array of x values
ys = f(xs)                  # corresp. array of y values
plt.plot(xs, ys)

[<matplotlib.lines.Line2D at 0x111228690>]

2. Take the derivative (`df(x)/dx` at any input point `x`)

Recall, various analytical rules from calculus (e.g. product rule etc) can be applied to the function $3 x^{2} - 4 x + 5$ defined above to find the analytical expression for the derivative of $f (x)$ .
In NNs, no one does this. The derived symbolic expression would have tens of thousands of terms.

Instead, by definition a derivative is:

$L = h \to 0 lim \frac{f ( x + h ) - f ( x )}{h}$

Which tells us the sensitivity (aka slope of the response) of the function, to when you nudge $x$ by an infinitesimal amount $h$

h = 0.00000001      # converges. careful of floating point memory (can't go too small)
x = 3.0             # slope AT this point (also try at x = 2/3)
(f(x + h) - f(x))/h # recall: rise/run; that's why we "normalise" by dividing by h

14.00000009255109

Note, we can confirm analytically for this function:

$\frac{d}{d x} (3 x^{2} - 4 x + 5) = 6 x - 4 ⟹ 6 (3) - 4 = 14$

3. Derivative of `d` wrt multiple inputs `a`, `b`, and `c`

# let's get more complex
a = 2.0
b = -3.0
c = 10.0   
d = a*b + c
print(d)

4.0

# numerically approximating the partial derivatives of d wrt each input (a, b, c)
h = 0.0001
 
# inputs (point coordinates (a,b,c) at which we're evaluating derivative of d)
a = 2.0
b = -3.0
c = 10.0
 
# the expression from before
d1 = a*b + c
 
# let's nudge each of our variables one at a time
a += h # first nudge a
d2_a = a*b + c
 
b += h # next nudge b
a -= h # (un-nudge a)
d2_b = a*b + c
 
c += h # finally nudge c
b -= h # (un-nudge b)
d2_c = a*b + c
 
print('d1', d1)
print('\nd2_a (approx d(d)/da):', d2_a)
print('slope (d wrt a):', (d2_a - d1)/h)
print('\nd2_b (approx d(d)/db):', d2_b)
print('slope (d wrt b):', (d2_b - d1)/h)
print('\nd2_c (approx d(d)/dc):', d2_c)
print('slope (d wrt c):', (d2_c - d1)/h)
print('\nEnsure the above makes sense.')
print('Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1')

d1 4.0
 
d2_a (approx d(d)/da): 3.999699999999999
slope (d wrt a): -3.000000000010772
 
d2_b (approx d(d)/db): 4.0002
slope (d wrt b): 2.0000000000042206
 
d2_c (approx d(d)/dc): 4.0001
slope (d wrt c): 0.9999999999976694
 
Ensure the above makes sense.
Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1

Sources

YouTube: The spelled-out intro to neural networks and backpropagation: building micrograd
karpathy/micrograd on GitHub
Jupyter notebooks from this chapter
Google Colab exercises

notes/

Derivatives and Autograd

Micrograd

What is a derivative?

1. Define (and call) a single-value scalar function `f(x)`

2. Take the derivative (`df(x)/dx` at any input point `x`)

Instead, by definition a derivative is:

3. Derivative of `d` wrt multiple inputs `a`, `b`, and `c`

Sources

Derivatives and Autograd

Micrograd

What is a derivative?

1. Define (and call) a single-value scalar function f(x)

2. Take the derivative (df(x)/dx at any input point x)

Instead, by definition a derivative is:

3. Derivative of d wrt multiple inputs a, b, and c

Sources

Graph View

Backlinks

Explorer

1. Define (and call) a single-value scalar function `f(x)`

2. Take the derivative (`df(x)/dx` at any input point `x`)

3. Derivative of `d` wrt multiple inputs `a`, `b`, and `c`