# imports
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Micrograd

  • Micrograd is scalar-valued autograd engine which implement backpropagation. It operates with scalar values only, for education purposes. Bigger networks need tensors for parallelisation (n-dim arrays of scalars).

What is a derivative?

What information does it contain?

1. Define (and call) a single-value scalar function f(x)

def f(x):
    return 3*x**2 - 4*x + 5
 
# call function
f(3.0)
20.0
# plot function to understand its shape
xs = np.arange(-5, 5, 0.25) # array of x values
ys = f(xs)                  # corresp. array of y values
plt.plot(xs, ys)
[<matplotlib.lines.Line2D at 0x111228690>]
plot

2. Take the derivative (df(x)/dx at any input point x)

  • Recall, various analytical rules from calculus (e.g. product rule etc) can be applied to the function defined above to find the analytical expression for the derivative of .
  • In NNs, no one does this. The derived symbolic expression would have tens of thousands of terms.

Instead, by definition a derivative is:

Which tells us the sensitivity (aka slope of the response) of the function, to when you nudge by an infinitesimal amount

h = 0.00000001      # converges. careful of floating point memory (can't go too small)
x = 3.0             # slope AT this point (also try at x = 2/3)
(f(x + h) - f(x))/h # recall: rise/run; that's why we "normalise" by dividing by h
14.00000009255109

Note, we can confirm analytically for this function:

3. Derivative of d wrt multiple inputs a, b, and c

# let's get more complex
a = 2.0
b = -3.0
c = 10.0   
d = a*b + c
print(d)
4.0
# numerically approximating the partial derivatives of d wrt each input (a, b, c)
h = 0.0001
 
# inputs (point coordinates (a,b,c) at which we're evaluating derivative of d)
a = 2.0
b = -3.0
c = 10.0
 
# the expression from before
d1 = a*b + c
 
# let's nudge each of our variables one at a time
a += h # first nudge a
d2_a = a*b + c
 
b += h # next nudge b
a -= h # (un-nudge a)
d2_b = a*b + c
 
c += h # finally nudge c
b -= h # (un-nudge b)
d2_c = a*b + c
 
print('d1', d1)
print('\nd2_a (approx d(d)/da):', d2_a)
print('slope (d wrt a):', (d2_a - d1)/h)
print('\nd2_b (approx d(d)/db):', d2_b)
print('slope (d wrt b):', (d2_b - d1)/h)
print('\nd2_c (approx d(d)/dc):', d2_c)
print('slope (d wrt c):', (d2_c - d1)/h)
print('\nEnsure the above makes sense.')
print('Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1')
d1 4.0
 
d2_a (approx d(d)/da): 3.999699999999999
slope (d wrt a): -3.000000000010772
 
d2_b (approx d(d)/db): 4.0002
slope (d wrt b): 2.0000000000042206
 
d2_c (approx d(d)/dc): 4.0001
slope (d wrt c): 0.9999999999976694
 
Ensure the above makes sense.
Think about the analytical solutions of partial derivatives: d(d)/da = -b, d(d)/db = a, and d(d)/dc = 1

Sources