Linear Algebra
11.710 min read

Approximation Error and Differentiability

We can be precise about how good the local linear approximation is. Define the approximation error E(h)=f(a+h)f(a)Df(a)hE(\mathbf{h}) = f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})\mathbf{h}. The LLA is "good" if this error is small compared to the size of the displacement h\mathbf{h}.

Formally, ff is differentiable at a\mathbf{a} if the error is o(h)o(\|\mathbf{h}\|) — meaning E(h)/h0E(\mathbf{h})/\|\mathbf{h}\| \to 0 as h0\mathbf{h} \to \mathbf{0}. This says the error goes to zero faster than the displacement itself.

In the single-variable case, this reduces to the usual definition of differentiability. The multivariable version is stronger because it requires the linear approximation to be good in every direction simultaneously, not just along the coordinate axes.

Formal View

Definition 11.8 — Differentiability (Formal)
A function f:DRnRf: D \subseteq \mathbb{R}^n \to \mathbb{R} is differentiable at aD\mathbf{a} \in D if there exists a row vector cTR1×n\mathbf{c}^T \in \mathbb{R}^{1\times n} such that
limh0f(a+h)f(a)cThh=0\lim_{\mathbf{h}\to\mathbf{0}} \frac{f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - \mathbf{c}^T \mathbf{h}}{\|\mathbf{h}\|} = 0
When it exists, cT=Df(a)\mathbf{c}^T = Df(\mathbf{a}) is the Jacobian row vector (row of partial derivatives).

This definition makes the Jacobian unique when it exists: no other linear map can satisfy this approximation property.

Theorem 11.2 — Uniqueness of the Derivative
If ff is differentiable at a\mathbf{a}, then the linear map cT\mathbf{c}^T satisfying the limit condition is unique and equals Df(a)=[1f(a),,nf(a)]Df(\mathbf{a}) = [\partial_1 f(\mathbf{a}), \ldots, \partial_n f(\mathbf{a})].

Why This Matters

The error analysis perspective reveals why some numerical methods work and why gradient-based optimization is justified.

  • Taylor expansion error bounds: the LLA error tells you when higher-order terms are needed
  • Finite-difference approximation accuracy: the o(h)o(\|\mathbf{h}\|) condition quantifies truncation error
  • Validating that machine-learning training signals (gradients) are meaningful approximations of true descent directions

Quiz

Question 1

The approximation error E(h)=f(a+h)f(a)Df(a)hE(\mathbf{h}) = f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})\mathbf{h} satisfies what condition when ff is differentiable at a\mathbf{a}?

Question 2

If E(h)/hChE(\mathbf{h})/\|\mathbf{h}\| \leq C\|\mathbf{h}\| for some constant CC, then ff is differentiable at a\mathbf{a}.

Common Mistakes

  • Confusing E(h)0E(\mathbf{h}) \to 0 (continuity-like) with E(h)/h0E(\mathbf{h})/\|\mathbf{h}\| \to 0 (differentiability) — the latter is much stronger.
  • Assuming differentiability just because the LLA "looks right" at a point without verifying the error condition.
  • Thinking the Jacobian uniqueness is obvious — it actually requires proof that no two distinct linear maps can both satisfy the approximation condition.