- Next: 02_nn_data_structs_and_forward_pass
- Related: backprop-graph-terminology
# imports
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inlineMicrograd
- Micrograd is scalar-valued autograd engine which implement backpropagation. It operates with scalar values only, for education purposes. Bigger networks need tensors for parallelisation (n-dim arrays of scalars).
Recap: Backpropagation
- Allows efficient evaluation of a loss function, , wrt. the weights, , of a neural network.
- Iteratively tune the weights, , to minimise the loss function, , improving the network’s accuracy.
What is a derivative?
What information does it contain?
1. Define (and call) a single-value scalar function f(x)
def f(x):
return 3*x**2 - 4*x + 5
# call function
f(3.0)20.0# plot function to understand its shape
xs = np.arange(-5, 5, 0.25) # array of x values
ys = f(xs) # corresp. array of y values
plt.plot(xs, ys)[<matplotlib.lines.Line2D at 0x111228690>]2. Take the derivative (df(x)/dx at any input point x)
- Recall, various analytical rules from calculus (e.g. product rule etc) can be applied to the function defined above to find the analytical expression for the derivative of .
- In NNs, no one does this. The derived symbolic expression would have tens of thousands of terms.
Instead, by definition a derivative is:
Which tells us the sensitivity (aka slope of the response) of the function, to when you nudge by an infinitesimal amount
h = 0.00000001 # converges. careful of floating point memory (can't go too small)
x = 3.0 # slope AT this point (also try at x = 2/3)
(f(x + h) - f(x))/h # recall: rise/run; that's why we "normalise" by dividing by h14.00000009255109Note, we can confirm analytically for this function:
3. Derivative of d wrt multiple inputs a, b, and c
# let's get more complex
a = 2.0
b = -3.0
c = 10.0
d = a*b + c
print(d)4.0# numerically approximating the partial derivatives of d wrt each input (a, b, c)
h = 0.0001
# inputs (point coordinates (a,b,c) at which we're evaluating derivative of d)
a = 2.0
b = -3.0
c = 10.0
# the expression from before
d1 = a*b + c
# let's nudge each of our variables one at a time
a += h # first nudge a
d2_a = a*b + c
b += h # next nudge b
a -= h # (un-nudge a)
d2_b = a*b + c
c += h # finally nudge c
b -= h # (un-nudge b)
d2_c = a*b + c
print('d1', d1)
print('\nd2_a (approx d(d)/da):', d2_a)
print('slope (d wrt a):', (d2_a - d1)/h)
print('\nd2_b (approx d(d)/db):', d2_b)
print('slope (d wrt b):', (d2_b - d1)/h)
print('\nd2_c (approx d(d)/dc):', d2_c)
print('slope (d wrt c):', (d2_c - d1)/h)
print('\nEnsure the above makes sense.')
print('Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1')d1 4.0
d2_a (approx d(d)/da): 3.999699999999999
slope (d wrt a): -3.000000000010772
d2_b (approx d(d)/db): 4.0002
slope (d wrt b): 2.0000000000042206
d2_c (approx d(d)/dc): 4.0001
slope (d wrt c): 0.9999999999976694
Ensure the above makes sense.
Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1Sources
- YouTube: The spelled-out intro to neural networks and backpropagation: building micrograd
- karpathy/micrograd on GitHub
- Jupyter notebooks from this chapter
- Google Colab exercises