Summary: Formalises the backpropagation intuition from Ch4 using the chain rule, first on a toy one-neuron-per-layer network, then generalising.

Setup (one neuron per layer)

Define intermediate variables for the last layer:

Chain rule decomposition

Each factor has a clear meaning:

  • : the weight matters more when the input neuron is active (Hebbian echo).
  • : the activation function’s slope gates how much the signal passes through (sigmoid saturation kills gradients here).
  • : the further off the prediction, the stronger the gradient signal.

Bias derivative

Same chain but , so it’s simply .

Propagating to earlier layers

The derivative w.r.t. replaces the first factor with . This gives the sensitivity of the cost to the previous activation, which you then use to repeat the process one layer back — the recursive heart of backprop.

Multi-neuron generalisation

The video argues that “not much changes” — and structurally it doesn’t. The derivation builds up in three steps:

Step 1 — New notation

With multiple neurons per layer, every variable picks up a subscript. Use for neurons in layer , for neurons in layer . The weighted sum, activation, and cost become:

The weight subscript means “going to neuron , coming from neuron ” — row , column of the weight matrix.

Step 2 — Weight derivative (same structure)

The chain rule for a specific weight is the same three-factor product as before, just with indices:

This works unchanged because only affects neuron — it touches exactly one path through the network.

Step 3 — Activation derivative (the new idea)

In the one-neuron-per-layer case, influenced the cost through a single path. Now, feeds into every neuron in the next layer, so it influences along multiple paths. The total sensitivity is the sum of each path’s chain-rule contribution:

This is the only genuinely new equation in the multi-neuron case. Once computed, it feeds backward into the next layer’s weight/bias derivatives — the same recursive step as before.

Full cost

Average over all training examples: .