Linear Algebra
12.38 min read

U-Derivative and the LLA

The local linear approximation gives us a powerful formula for directional derivatives. If ff is differentiable at a\mathbf{a}, then near a\mathbf{a} we have f(a+tu)f(a)+Df(a)(tu)=f(a)+tDf(a)uf(\mathbf{a}+t\mathbf{u}) \approx f(\mathbf{a}) + Df(\mathbf{a})(t\mathbf{u}) = f(\mathbf{a}) + t \cdot Df(\mathbf{a})\mathbf{u}.

Substituting into the definition: Duf(a)=limt0f(a+tu)f(a)t=limt0tDf(a)u+o(t)t=Df(a)uD_\mathbf{u} f(\mathbf{a}) = \lim_{t\to 0} \frac{f(\mathbf{a}+t\mathbf{u})-f(\mathbf{a})}{t} = \lim_{t\to 0} \frac{t \cdot Df(\mathbf{a})\mathbf{u} + o(t)}{t} = Df(\mathbf{a})\mathbf{u}.

In terms of the gradient: Duf(a)=Df(a)u=f(a)uD_\mathbf{u} f(\mathbf{a}) = Df(\mathbf{a})\mathbf{u} = \nabla f(\mathbf{a}) \cdot \mathbf{u}. This is the fundamental formula connecting the Jacobian, the gradient, and directional derivatives.

Formal View

Theorem 12.2 — Directional Derivative from Jacobian
If f:DRnRf: D \subseteq \mathbb{R}^n \to \mathbb{R} is differentiable at a\mathbf{a}, then for any unit vector uRn\mathbf{u} \in \mathbb{R}^n:
Duf(a)=Df(a)u=f(a)uD_\mathbf{u} f(\mathbf{a}) = Df(\mathbf{a})\mathbf{u} = \nabla f(\mathbf{a}) \cdot \mathbf{u}

This shows that once you know the gradient, you know all directional derivatives. The gradient encodes all first-order information about ff.

Why This Matters

This formula reduces computing directional derivatives to computing the gradient once and taking dot products.

  • Efficient computation: compute f\nabla f once, then get any directional derivative for free
  • Understanding why gradient descent works: the gradient direction maximizes DufD_\mathbf{u} f
  • Basis for Lagrange multiplier methods and constrained optimization

Quiz

Question 1

If f(a)=(4,1,2)\nabla f(\mathbf{a}) = (4, -1, 2) and u=(0,1,0)\mathbf{u} = (0, 1, 0), what is Duf(a)D_\mathbf{u} f(\mathbf{a})?

Question 2

If ff has all partial derivatives at a\mathbf{a}, then the directional derivative formula Duf(a)=f(a)uD_\mathbf{u} f(\mathbf{a}) = \nabla f(\mathbf{a}) \cdot \mathbf{u} holds.

Common Mistakes

  • Applying the formula Duf=fuD_\mathbf{u} f = \nabla f \cdot \mathbf{u} without verifying differentiability.
  • Confusing DufD_\mathbf{u} f (scalar) with DfDf (matrix/row vector).
  • Forgetting to normalize u\mathbf{u} before applying the formula.