Linear Algebra
14.68 min read

Error Analysis

To make the LLA derivation of the chain rule rigorous, we must show the composite error is o(h)o(\|\mathbf{h}\|). Write: f(a+h)=f(a)+Jf(a)h+e1(h)\mathbf{f}(\mathbf{a}+\mathbf{h}) = \mathbf{f}(\mathbf{a}) + J\mathbf{f}(\mathbf{a})\mathbf{h} + \mathbf{e}_1(\mathbf{h}) where e1(h)/h0\|\mathbf{e}_1(\mathbf{h})\|/\|\mathbf{h}\| \to 0. Let δ=Jf(a)h+e1(h)\boldsymbol{\delta} = J\mathbf{f}(\mathbf{a})\mathbf{h} + \mathbf{e}_1(\mathbf{h}), so f(a+h)=f(a)+δ\mathbf{f}(\mathbf{a}+\mathbf{h}) = \mathbf{f}(\mathbf{a}) + \boldsymbol{\delta}.

Then g(f(a+h))=g(f(a)+δ)=g(f(a))+Jg(f(a))δ+e2(δ)\mathbf{g}(\mathbf{f}(\mathbf{a}+\mathbf{h})) = \mathbf{g}(\mathbf{f}(\mathbf{a})+\boldsymbol{\delta}) = \mathbf{g}(\mathbf{f}(\mathbf{a})) + J\mathbf{g}(\mathbf{f}(\mathbf{a}))\boldsymbol{\delta} + \mathbf{e}_2(\boldsymbol{\delta}) where e2(δ)/δ0\|\mathbf{e}_2(\boldsymbol{\delta})\|/\|\boldsymbol{\delta}\| \to 0.

Substituting and rearranging, the composite error is Jg(f(a))e1(h)+e2(δ)J\mathbf{g}(\mathbf{f}(\mathbf{a}))\mathbf{e}_1(\mathbf{h}) + \mathbf{e}_2(\boldsymbol{\delta}). Since δ=O(h)\boldsymbol{\delta} = O(\|\mathbf{h}\|), both terms are o(h)o(\|\mathbf{h}\|), completing the proof.

Formal View

Theorem 14.4 — Chain Rule Error Bound
The composite error satisfies h(a+h)h(a)Jg(f(a))Jf(a)hh0\frac{\|\mathbf{h}(\mathbf{a}+\mathbf{h}) - \mathbf{h}(\mathbf{a}) - J\mathbf{g}(\mathbf{f}(\mathbf{a}))J\mathbf{f}(\mathbf{a})\mathbf{h}\|}{\|\mathbf{h}\|} \to 0 as h0\mathbf{h}\to\mathbf{0}, establishing J(gf)(a)=Jg(f(a))Jf(a)J(\mathbf{g}\circ\mathbf{f})(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a}))\cdot J\mathbf{f}(\mathbf{a}).

Why This Matters

Understanding error analysis for composed functions clarifies the role of differentiability in making the chain rule valid.

  • Numerical differentiation: error accumulation through composed function evaluations
  • Floating-point arithmetic in automatic differentiation: rounding errors in composed computations
  • Convergence analysis: rates at which iterative methods converge depend on composition of approximation errors

Quiz

Question 1

In the error analysis, why is e2(δ)=o(h)\|\mathbf{e}_2(\boldsymbol{\delta})\| = o(\|\mathbf{h}\|) even though e2=o(δ)\|\mathbf{e}_2\| = o(\|\boldsymbol{\delta}\|)?

Question 2

The chain rule error analysis shows that differentiability of f\mathbf{f} at a\mathbf{a} and differentiability of g\mathbf{g} at f(a)\mathbf{f}(\mathbf{a}) are both required.

Common Mistakes

  • Skipping the error analysis and treating the chain rule as "obvious from notation".
  • Not distinguishing between e2=o(δ)\|\mathbf{e}_2\| = o(\|\boldsymbol{\delta}\|) and e2=o(h)\|\mathbf{e}_2\| = o(\|\mathbf{h}\|).
  • Forgetting that both differentiability conditions are needed, not just one.