The Forward Pass
The forward pass is simple: plug in , follow data left to right, compute .
But there is a crucial additional step: at each function node , store the input values that arrived. If node received values , cache those as .
Why? The backward pass needs to compute local derivatives like . These depend on the actual input values at . Without caching them, you would have to re-run part of the forward pass mid-backward-pass.
Think of it like a long calculation on paper: circle intermediate results as you go, because you might need them later.
Formal View
Why This Matters
The forward pass is essentially free — just evaluating . Storing intermediate values is the small price for efficient backpropagation.
- Neural network inference (forward pass only — no storage needed)
- Training requires storing activations for the backward pass
- Gradient checkpointing: store fewer activations and recompute selectively to save memory
- Stored activations are also useful for debugging training pathologies
Learning Resources
Forward and backward passes explained
deeplizard
Clear explanation of forward and backward passes with concrete neural network examples.
The spelled-out intro to neural networks and backpropagation: building micrograd
Andrej Karpathy
Stores forward-pass data as attributes on value objects, exactly as the theory prescribes.
Quiz
What does the forward pass store at each function node?
For node computing , why store and (not just )?
Storing intermediate values during the forward pass is optional — they can always be recomputed on demand.
The forward pass processes nodes in which order?
Common Mistakes
- Storing only the output and not the inputs — you need the inputs for local gradient computation.
- Running the forward pass right-to-left (that is the backward direction).
- Thinking you store symbolic expressions rather than numerical values.