Linear Algebra
14.128 min read

Detailed Justification

Let's trace through the full justification of the chain rule in the one-underlying-variable case more carefully. We want to show h(t)=g(f(t))f(t)h'(t) = \nabla g(\mathbf{f}(t)) \cdot \mathbf{f}'(t).

By definition: h(t)=lims0h(t+s)h(t)s=lims0g(f(t+s))g(f(t))sh'(t) = \lim_{s\to 0}\frac{h(t+s)-h(t)}{s} = \lim_{s\to 0}\frac{g(\mathbf{f}(t+s))-g(\mathbf{f}(t))}{s}.

Write f(t+s)=f(t)+sf(t)+e(s)\mathbf{f}(t+s) = \mathbf{f}(t) + s\mathbf{f}'(t) + \mathbf{e}(s) where e(s)=o(s)\mathbf{e}(s) = o(s). Then g(f(t+s))=g(f(t)+sf(t)+e(s))g(f(t))+g(f(t))(sf(t)+e(s))g(\mathbf{f}(t+s)) = g(\mathbf{f}(t) + s\mathbf{f}'(t) + \mathbf{e}(s)) \approx g(\mathbf{f}(t)) + \nabla g(\mathbf{f}(t)) \cdot (s\mathbf{f}'(t) + \mathbf{e}(s)). Dividing by ss and taking the limit: h(t)=g(f(t))f(t)h'(t) = \nabla g(\mathbf{f}(t)) \cdot \mathbf{f}'(t).

Formal View

Theorem 14.7 — Chain Rule — Detailed Proof (Scalar Case)
The approximation h(t+s)h(t)=g(f(t))f(t)s+o(s)h(t+s) - h(t) = \nabla g(\mathbf{f}(t)) \cdot \mathbf{f}'(t) \cdot s + o(s) follows from: 1. f(t+s)=f(t)+sf(t)+o(s)\mathbf{f}(t+s) = \mathbf{f}(t) + s\mathbf{f}'(t) + o(s) (differentiability of f\mathbf{f}) 2. g(f(t)+δ)=g(f(t))+g(f(t))δ+o(δ)g(\mathbf{f}(t)+\boldsymbol{\delta}) = g(\mathbf{f}(t)) + \nabla g(\mathbf{f}(t))\cdot\boldsymbol{\delta} + o(\|\boldsymbol{\delta}\|) (differentiability of gg) 3. δ=O(s)\|\boldsymbol{\delta}\| = O(|s|), so o(δ)=o(s)o(\|\boldsymbol{\delta}\|) = o(|s|).

Why This Matters

Working through the detailed proof builds confidence and reveals what conditions are truly necessary.

  • Understanding exactly where differentiability is used clarifies what conditions the chain rule requires
  • Building foundation for understanding when generalizations (e.g., to non-smooth functions) can be made

Quiz

Question 1

In the detailed justification, which step uses differentiability of f\mathbf{f} at tt?

Common Mistakes

  • Skipping the step where δ=O(s)\|\boldsymbol{\delta}\| = O(|s|) is used to upgrade o(δ)o(\|\boldsymbol{\delta}\|) to o(s)o(|s|).
  • Treating the proof sketch as a complete proof without bounding the error terms.