Summary: The computation graph is a tree rooted at the output; “children” means inputs, and “upstream/downstream” describe gradient flow direction, which is the opposite of data flow direction.
downstream gradient = local gradient × upstream gradient
Graph direction terminology is genuinely confusing!
The computation graph is a tree rooted at the output.
The structure may be more familiar as an image. Here is the forward pass (Left-to-Right):
- Leaf nodes on the left
- Root / output node on the right

With that structure in mind, it’s useful to think of backpropagation graph terminology from Right-to-Left:

Takeaways
| Term | Node(s) “location” | Means |
|---|---|---|
| Root | Rightmost | The output node (o or L) — where backprop starts |
| Leaf node | Leftmost | A node with no inputs (_prev is empty) — the raw data and weights |
| Children | Immediately left | A node’s inputs — the nodes in _prev that were used to compute it |
| Upstream | Left of it | Toward the inputs / leaves — where gradients go to |
| Downstream | Right of it | Toward the output / root — where gradients come from |

downstream gradient = local gradient × upstream gradientSo, confusingly, upstream and downstream mean the opposite of everyday English:
- Forward pass: Data flows “upstream” toward the output
- leaf → root (i.e. left → right)
- Backward pass: Gradients flow “downstream” toward the inputs
- root → leaf (i.e. right → left)
- The phrase “backprop deposits the gradient into its children” means:
- the
_backwardofn = x1w1 + btakesn.gradand writes intox1w1andb— the nodes that were used to producen.
- the