Linear Algebra
14.98 min read

Matrix Form Justification

The matrix multiplication form of the chain rule Jh=JgJfJ\mathbf{h} = J\mathbf{g} \cdot J\mathbf{f} can be understood through the LLA perspective: composing two affine maps gives another affine map.

The LLA of f\mathbf{f} near a\mathbf{a} is f(a+h)f(a)+Jf(a)h\mathbf{f}(\mathbf{a}+\mathbf{h}) \approx \mathbf{f}(\mathbf{a}) + J\mathbf{f}(\mathbf{a})\mathbf{h} — an affine map in h\mathbf{h}. The LLA of g\mathbf{g} near f(a)\mathbf{f}(\mathbf{a}) is g(f(a)+δ)g(f(a))+Jg(f(a))δ\mathbf{g}(\mathbf{f}(\mathbf{a})+\boldsymbol{\delta}) \approx \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a}))\boldsymbol{\delta}.

Substituting δ=Jf(a)h\boldsymbol{\delta} = J\mathbf{f}(\mathbf{a})\mathbf{h}: g(f(a+h))g(f(a))+Jg(f(a))Jf(a)h\mathbf{g}(\mathbf{f}(\mathbf{a}+\mathbf{h})) \approx \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}) \mathbf{h}. The composed LLA has the linear part JgJfJ\mathbf{g} \cdot J\mathbf{f}, which is the Jacobian of the composition.

Formal View

Remark 14.3 — Why It's Matrix Multiplication
Composing affine maps is matrix multiplication: if L1(h)=A1h+b1L_1(\mathbf{h}) = A_1\mathbf{h} + \mathbf{b}_1 and L2(h)=A2h+b2L_2(\mathbf{h}) = A_2\mathbf{h} + \mathbf{b}_2, then L2(L1(h))=A2A1h+(A2b1+b2)L_2(L_1(\mathbf{h})) = A_2 A_1 \mathbf{h} + (A_2 \mathbf{b}_1 + \mathbf{b}_2). The linear part is A2A1A_2 A_1 — matrix multiplication. The chain rule is simply this fact applied to the LLAs.

Interactive Visualization

Matrix Product — Column Perspective

Why This Matters

Seeing the chain rule as matrix multiplication unifies calculus and linear algebra.

  • Deep learning: each layer applies a matrix multiplication (the Jacobian), and backprop reverses the chain by multiplying Jacobians back to front
  • Automatic differentiation: forward and reverse mode are two orderings of matrix multiplication in the chain rule
  • Numerical linear algebra: Krylov methods apply the chain rule as matrix-vector products

Quiz

Question 1

Composing two linear maps xAx\mathbf{x} \mapsto A\mathbf{x} and yBy\mathbf{y} \mapsto B\mathbf{y} gives the linear map xBAx\mathbf{x} \mapsto BA\mathbf{x}. This is analogous to the chain rule with JgJfJ\mathbf{g} \cdot J\mathbf{f} because:

Common Mistakes

  • Thinking the chain rule involves adding Jacobians — it is always multiplication, not addition.
  • Confusing the order of multiplication when composing three or more functions.