Linear Algebra
14.510 min read

Justification via LLA

The chain rule follows naturally from the local linear approximation. Near a\mathbf{a}, f(a+h)f(a)+Jf(a)h\mathbf{f}(\mathbf{a}+\mathbf{h}) \approx \mathbf{f}(\mathbf{a}) + J\mathbf{f}(\mathbf{a})\mathbf{h}. Now apply g\mathbf{g} to both sides: near u=f(a)\mathbf{u} = \mathbf{f}(\mathbf{a}), g(u+δ)g(u)+Jg(u)δ\mathbf{g}(\mathbf{u}+\boldsymbol{\delta}) \approx \mathbf{g}(\mathbf{u}) + J\mathbf{g}(\mathbf{u})\boldsymbol{\delta}.

With δ=Jf(a)h\boldsymbol{\delta} = J\mathbf{f}(\mathbf{a})\mathbf{h}: h(a+h)=g(f(a+h))g(f(a))+Jg(f(a))Jf(a)h\mathbf{h}(\mathbf{a}+\mathbf{h}) = \mathbf{g}(\mathbf{f}(\mathbf{a}+\mathbf{h})) \approx \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a})\mathbf{h}

This shows that the LLA of h\mathbf{h} at a\mathbf{a} has Jacobian Jg(f(a))Jf(a)J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}). The rigorous proof requires bounding the error from composing the two approximations, which is o(h)o(\|\mathbf{h}\|).

Formal View

Theorem 14.3 (Chain Rule) — Chain Rule via LLA
If f\mathbf{f} is differentiable at a\mathbf{a} and g\mathbf{g} is differentiable at f(a)\mathbf{f}(\mathbf{a}), then h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f} is differentiable at a\mathbf{a} and
Jh(a)=Jg(f(a))Jf(a)J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a})

Proof sketch: write f(a+h)=f(a)+Jf(a)h+e1\mathbf{f}(\mathbf{a}+\mathbf{h}) = \mathbf{f}(\mathbf{a}) + J\mathbf{f}(\mathbf{a})\mathbf{h} + \mathbf{e}_1 with e1/h0\|\mathbf{e}_1\|/\|\mathbf{h}\| \to 0, then apply the LLA for g\mathbf{g} to bound the composite error.

Why This Matters

The LLA derivation makes the chain rule intuitive: compose two linear approximations to get a linear approximation of the composition.

  • Forward-mode autodiff: propagate the LLA of inner function through the LLA of outer function
  • Perturbation analysis: how does a small change in x\mathbf{x} propagate through f\mathbf{f} then g\mathbf{g}?
  • Physics: how does a small change in coordinates propagate through a physical model?

Quiz

Question 1

In the LLA derivation of the chain rule, the Jacobian of g\mathbf{g} is evaluated at which point?

Question 2

The chain rule says: composing two differentiable functions gives a differentiable function.

Common Mistakes

  • Forgetting that the error from composing two LLAs must be separately bounded to complete the proof.
  • Thinking the LLA derivation is just a heuristic — with careful error bounds, it becomes a rigorous proof.
  • Not checking the domain condition: g\mathbf{g} must be differentiable at f(a)\mathbf{f}(\mathbf{a}), not at a\mathbf{a}.