11.710 min read

Approximation Error and Differentiability

We can be precise about how good the local linear approximation is. Define the approximation error $E(\mathbf{h}) = f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})\mathbf{h}$ . The LLA is "good" if this error is small compared to the size of the displacement $\mathbf{h}$ .

Formally, $f$ is differentiable at $\mathbf{a}$ if the error is $o(\|\mathbf{h}\|)$ — meaning $E(\mathbf{h})/\|\mathbf{h}\| \to 0$ as $\mathbf{h} \to \mathbf{0}$ . This says the error goes to zero faster than the displacement itself.

In the single-variable case, this reduces to the usual definition of differentiability. The multivariable version is stronger because it requires the linear approximation to be good in every direction simultaneously, not just along the coordinate axes.

Formal View

Definition 11.8 — Differentiability (Formal)

A function

f: D \subseteq \mathbb{R}^n \to \mathbb{R}

is differentiable at

\mathbf{a} \in D

if there exists a row vector

\mathbf{c}^T \in \mathbb{R}^{1\times n}

such that

\lim_{\mathbf{h}\to\mathbf{0}} \frac{f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - \mathbf{c}^T \mathbf{h}}{\|\mathbf{h}\|} = 0

When it exists,

\mathbf{c}^T = Df(\mathbf{a})

is the Jacobian row vector (row of partial derivatives).

This definition makes the Jacobian unique when it exists: no other linear map can satisfy this approximation property.

Theorem 11.2 — Uniqueness of the Derivative

f

is differentiable at

\mathbf{a}

, then the linear map

\mathbf{c}^T

satisfying the limit condition is unique and equals

Df(\mathbf{a}) = [\partial_1 f(\mathbf{a}), \ldots, \partial_n f(\mathbf{a})]

Why This Matters

The error analysis perspective reveals why some numerical methods work and why gradient-based optimization is justified.

Taylor expansion error bounds: the LLA error tells you when higher-order terms are needed
Finite-difference approximation accuracy: the $o(\|\mathbf{h}\|)$ condition quantifies truncation error
Validating that machine-learning training signals (gradients) are meaningful approximations of true descent directions

Learning Resources

Differentiability vs Partial Derivatives

MIT OpenCourseWare

Rigorous treatment of differentiability and the role of approximation error.

15 min

Total Derivative and Differentiability

Khan Academy

Intuition for differentiability as existence of a best linear approximation.

10 min

Quiz

Question 1

The approximation error $E(\mathbf{h}) = f(\mathbf{a}+\mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})\mathbf{h}$ satisfies what condition when $f$ is differentiable at $\mathbf{a}$ ?

Question 2

If $E(\mathbf{h})/\|\mathbf{h}\| \leq C\|\mathbf{h}\|$ for some constant $C$ , then $f$ is differentiable at $\mathbf{a}$ .

Common Mistakes

Confusing $E(\mathbf{h}) \to 0$ (continuity-like) with $E(\mathbf{h})/\|\mathbf{h}\| \to 0$ (differentiability) — the latter is much stronger.
Assuming differentiability just because the LLA "looks right" at a point without verifying the error condition.
Thinking the Jacobian uniqueness is obvious — it actually requires proof that no two distinct linear maps can both satisfy the approximation condition.