Summary: The computation graph is a tree rooted at the output; “children” means inputs, and “upstream/downstream” describe gradient flow direction, which is the opposite of data flow direction.

downstream gradient = local gradient × upstream gradient

Graph direction terminology is genuinely confusing!

The computation graph is a tree rooted at the output.

The structure may be more familiar as an image. Here is the forward pass (Left-to-Right):

  • Leaf nodes on the left
  • Root / output node on the right


With that structure in mind, it’s useful to think of backpropagation graph terminology from Right-to-Left:

Takeaways

TermNode(s) “location”Means
RootRightmostThe output node (o or L) — where backprop starts
Leaf nodeLeftmostA node with no inputs (_prev is empty) — the raw data and weights
ChildrenImmediately leftA node’s inputs — the nodes in _prev that were used to compute it
UpstreamLeft of itToward the inputs / leaves — where gradients go to
DownstreamRight of itToward the output / root — where gradients come from

downstream gradient = local gradient × upstream gradient

So, confusingly, upstream and downstream mean the opposite of everyday English:

  • Forward pass: Data flows “upstream” toward the output
    • leaf → root (i.e. left → right)
  • Backward pass: Gradients flow “downstream” toward the inputs
    • root → leaf (i.e. right → left)
  • The phrase “backprop deposits the gradient into its children” means:
    • the _backward of n = x1w1 + b takes n.grad and writes into x1w1 and b — the nodes that were used to produce n.

See also