15.78 min read

Putting It All Together

The full backpropagation algorithm is built from just two rules applied in sequence. Here is the complete procedure:

1 — Forward pass: Set inputs to $\mathbf{t}_0$ . Propagate left to right, storing $\mathbf{s}_0$ at each function node. Record $f_0 = f(\mathbf{t}_0)$ .

2 — Seed: Place the number 1 on the output wire. This represents $\frac{\partial f}{{\partial f}} = 1$ .

3 — Backward pass: Process nodes right to left. Function node: multiply the incoming gradient by each local derivative and place the result on each input wire. Splitter node: add all incoming gradients and place the sum on the input wire.

4 — Read results: The numbers that arrive at each $t_i$ are exactly $\frac{\partial f}{{\partial t_i}}(\mathbf{t}_0)$ . Done.

The total cost is proportional to circuit size — roughly one extra forward pass worth of work. Computing all $n$ partial derivatives costs no more than computing $f$ itself twice. That is what makes training networks with millions of parameters tractable.

Formal View

Theorem 15.1 — Backpropagation Correctness

The backpropagation algorithm correctly computes

\frac{\partial f}{\partial t_i}(\mathbf{t}_0)

for all

i

simultaneously. The time complexity is

O(|\text{circuit}|)

— proportional to circuit size, independent of the number of inputs

n

. This is optimal up to constants.

Why This Matters

Backpropagation is arguably the most impactful algorithm in modern machine learning. Without it, training deep networks would require $O(n)$ forward passes — impractical for millions of parameters.

Training all modern neural networks: LLMs, image classifiers, speech models, protein folding
Differentiable programming: compiling arbitrary code to be differentiable
Second-order optimization by running backprop through backprop
Scientific computing where gradients of physical simulations are needed

Learning Resources

What is backpropagation really doing?

3Blue1Brown

The complete algorithm assembled, with neural network context.

14 min

Backpropagation, step by step

StatQuest with Josh Starmer

Complete numerical walkthrough of both forward and backward passes.

18 min

The spelled-out intro to neural networks and backpropagation: building micrograd

Andrej Karpathy

Implements the complete algorithm in Python, then scales it to train a small language model.

144 min

Quiz

Question 1

Computational cost of backpropagation relative to one forward pass?

Question 2

Backward pass processes nodes in which order?

Question 3

After backpropagation, the numbers on input wires $t_i$ represent:

Question 4

Changing $\mathbf{t}_0$ requires rerunning both the forward and backward passes.

Question 5

Which correctly summarizes the two backward-pass rules?

Common Mistakes

Swapping the rules: adding at function nodes and multiplying at splitters.
Forgetting to seed with 1 on the output wire.
Running the backward pass before completing the forward pass — you need stored values.
Interpreting gradient numbers as an approximation rather than an exact value at $\mathbf{t}_0$ .