14.98 min read

Matrix Form Justification

The matrix multiplication form of the chain rule $J\mathbf{h} = J\mathbf{g} \cdot J\mathbf{f}$ can be understood through the LLA perspective: composing two affine maps gives another affine map.

The LLA of $\mathbf{f}$ near $\mathbf{a}$ is $\mathbf{f}(\mathbf{a}+\mathbf{h}) \approx \mathbf{f}(\mathbf{a}) + J\mathbf{f}(\mathbf{a})\mathbf{h}$ — an affine map in $\mathbf{h}$ . The LLA of $\mathbf{g}$ near $\mathbf{f}(\mathbf{a})$ is $\mathbf{g}(\mathbf{f}(\mathbf{a})+\boldsymbol{\delta}) \approx \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a}))\boldsymbol{\delta}$ .

Substituting $\boldsymbol{\delta} = J\mathbf{f}(\mathbf{a})\mathbf{h}$ : $\mathbf{g}(\mathbf{f}(\mathbf{a}+\mathbf{h})) \approx \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}) \mathbf{h}$ . The composed LLA has the linear part $J\mathbf{g} \cdot J\mathbf{f}$ , which is the Jacobian of the composition.

Formal View

Remark 14.3 — Why It's Matrix Multiplication

Composing affine maps is matrix multiplication: if

L_1(\mathbf{h}) = A_1\mathbf{h} + \mathbf{b}_1

and

L_2(\mathbf{h}) = A_2\mathbf{h} + \mathbf{b}_2

, then

L_2(L_1(\mathbf{h})) = A_2 A_1 \mathbf{h} + (A_2 \mathbf{b}_1 + \mathbf{b}_2)

. The linear part is

A_2 A_1

— matrix multiplication. The chain rule is simply this fact applied to the LLAs.

Interactive Visualization

Matrix Product — Column Perspective

Why This Matters

Seeing the chain rule as matrix multiplication unifies calculus and linear algebra.

Deep learning: each layer applies a matrix multiplication (the Jacobian), and backprop reverses the chain by multiplying Jacobians back to front
Automatic differentiation: forward and reverse mode are two orderings of matrix multiplication in the chain rule
Numerical linear algebra: Krylov methods apply the chain rule as matrix-vector products

Learning Resources

Chain Rule as Matrix Multiplication

Steve Brunton

Matrix formulation of the chain rule and its connection to automatic differentiation.

20 min

Backpropagation as Chain Rule

3Blue1Brown

Neural network backpropagation as iterated Jacobian multiplication.

14 min

Quiz

Question 1

Composing two linear maps $\mathbf{x} \mapsto A\mathbf{x}$ and $\mathbf{y} \mapsto B\mathbf{y}$ gives the linear map $\mathbf{x} \mapsto BA\mathbf{x}$ . This is analogous to the chain rule with $J\mathbf{g} \cdot J\mathbf{f}$ because:

Common Mistakes

Thinking the chain rule involves adding Jacobians — it is always multiplication, not addition.
Confusing the order of multiplication when composing three or more functions.