# imports, `Value` class, graphviz: trace() & draw_dot()import mathimport numpy as npimport matplotlib.pyplot as plt%matplotlib inline# Object definitions from end of previous chapter:# Value class:class Value: def __init__(self, data, _children=(), _op='', label=''): self.data = data self.grad = 0.0 self._prev = set(_children) self._op = _op self.label = label def __repr__(self): return f"Value(data={self.data})" def __add__(self, other): out = Value(self.data + other.data, (self, other), '+') return out def __mul__(self, other): out = Value(self.data * other.data, (self, other), '*') return out# graphvizfrom graphviz import Digraphdef trace(root): # recursively builds a set of all nodes and edges in a graph nodes, edges = set(), set() def build(v): if v not in nodes: nodes.add(v) for child in v._prev: edges.add((child, v)) build(child) build(root) return nodes, edgesdef draw_dot(root): dot = Digraph(format='svg', graph_attr={'rankdir': 'LR'}) # LR = left to right nodes, edges = trace(root) for n in nodes: uid = str(id(n)) # for any value in the graph, create a rectangular ('record') node for it dot.node(name = uid, label = "{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record') if n._op: # if this value is a result of some operation, create an op node for it dot.node(name = uid + n._op, label = n._op) # and connect this node to it dot.edge(uid + n._op, uid) for n1, n2 in edges: # connect n_i to the op node of n2 dot.edge(str(id(n1)), str(id(n2)) + n2._op) return dot# draw_dot(L)
8. Manual backpropagation (train a neuron)
Inspiring example: MLP and a single neuron
An example neural network (MLP):
A mathematical model of a single neuron in MLP. Note the multiplicative relationship between input and weight (synapse):
In this example, is the hyperbolic tan function, tanh:
tanhx=coshxsinhx=ex+e−xex−e−x=e2x+1e2x−1
# example activation function: tanh smoothly caps large (+ or -) inputs to +1 or -1 respectivelyplt.figure(figsize=(4, 3), dpi=80)plt.plot(np.arange(-5,5,0.2), np.tanh(np.arange(-5,5,0.2))); plt.grid()
8.1. Define the forward pass (i.e. initialise NN)
Initialise:
neuron inputs (data): x1 and x2
weights: w1 and w2
bias: b
Then compute the neuron’s pre-activation value:
n=x1⋅w1+x2⋅w2+b
Visualise the computation graph as a DAG with graphviz):
# init nn: params x1,x2; w1,w2; b -> intermediate nodes x1w1, x2w2, x1w1x2w2 -> output (n)# neuron inputs x1,x2 (2 dimensional neuron)x1 = Value(2.0, label='x1')x2 = Value(0.0, label='x2')# weights of neuron w1,w2 (synaptic strengths for each input)w1 = Value(-3.0, label='w1')w2 = Value(1.0, label='w2')# bias of the neuronb = Value(6.7, label='b')# following the graph above to create: x1*w1 + x2*w2 + bx1w1 = x1 * w1; x1w1.label = 'x1*w1'x2w2 = x2 * w2; x2w2.label = 'x2*w2'x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'# cell body raw activation (without the activation function)n = x1w1x2w2 + b; n.label = 'n'draw_dot(n)
8.2. Define the activation function tanh in Value
The cell below (calculating output via activation function tanh) throws an error.
tanh is not defined in Value
Hyperbolic functions cannot be computed via Value object’s methods we defined earlier
__add__ (+) and __mul__ (*) are insufficient
Division and/or exponentiation is also needed.
# i - output axon (via activation function tanh) -- THROWS ERROR!o = n.tanh() # throws error since Python doesn't know how to do tanh for a Value object
---------------------------------------------------------------------------AttributeError Traceback (most recent call last)Cell In[4], line 2 1 # i - output axon (via activation function tanh) -- THROWS ERROR!----> 2 o = n.tanh() # throws error since Python doesn't know how to do tanh for a Value objectAttributeError: 'Value' object has no attribute 'tanh'
We could implement the ideas of __div__() and exp() as new methods into our Value object, and then reproduce the tanh operator
However we can also directly define tanh as the tanh method, as long as we know how to take its local derivative
Any arbitrarily complicated function can be directly defined in Value, if we know how to take its local derivative (how its inputs impact its output)
Now we can compute the neuron’s post-activation value (and visualise with graphviz):
o=tanh(n)=tanh(x1⋅w1+x2⋅w2+b)
# extend `Value` class with `tanh(self)` method; reset network (slightly modify bias `b`); visualiseclass Value: def __init__(self, data, _children=(), _op='', label=''): self.data = data self.grad = 0.0 self._prev = set(_children) self._op = _op self.label = label def __repr__(self): return f"Value(data={self.data})" def __add__(self, other): out = Value(self.data + other.data, (self, other), '+') return out def __mul__(self, other): out = Value(self.data * other.data, (self, other), '*') return out # defining the tanh method (our activation function) in one go! def tanh(self): x = self.data t = (math.exp(2*x) - 1)/(math.exp(2*x) + 1) # the tanh node only has 1 child, so it's a tuple of 1 node "(self, )", # and op name is 'tanh' out = Value(t, (self, ), 'tanh') return out# same values as earlier: define inputs (x1,x2), weights (w1,w2)x1 = Value(2.0, label='x1'); x2 = Value(0.0, label='x2')w1 = Value(-3.0, label='w1'); w2 = Value(1.0, label='w2')x1w1 = x1 * w1; x1w1.label = 'x1*w1'; x2w2 = x2 * w2; x2w2.label = 'x2*w2'x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'# manually change bias to make number nice for education: # b=8 to see tanh squishing the post-activation value, o, to just below +1, # b=6.8813735870195432 makes derivative = 1b = Value(6.8813735870195432, label='b')n = x1w1x2w2 + b; n.label = 'n'# try re-run the activation function on n (the raw cell body) and draw the output node oo = n.tanh(); o.label = 'o'draw_dot(o)
8.3. Now the backward pass (backpropagation)
We’re particularly interested in dw1do and dw2do
We can only change the weights, w1 and w2 during training of the neural net.
The data, x1 and x2 is fixed.
Also note, this is only 1 neuron. A real NN has many connected neurons
The loss function evaluates to a single number, at the very end of that NN
It measures the NN’s accuracy (a goalpost for the NN’s backpropagation)
8.3.1 Manually backpropagate (hand-assign gradients of prior nodes)
See the image below. Immensely helpful.
Base case is known:
dodo=1
Per wikipedia (many options):
o=tanh(n)→dndo=1−tanh2(n)
We know tanh(n)=o, so by substitution: dndo=1−o2
dndo is “distributed” (plus node, +) to n‘s upstream nodes [x1w1+x2w2] and b:
d(x1w1+x2w2)do=dbdo=dndo=1−o2
dndo is again distributed (plus node, +) to [x1w1+x2w2]‘s upstream nodes [x1w1] and [x2w2]: