Linear Algebra
12.610 min read

The Gradient Vector

The gradient of a scalar function f:RnRf: \mathbb{R}^n \to \mathbb{R} at a\mathbf{a} is the column vector of all partial derivatives: f(a)=[f/x1(a)f/xn(a)]\nabla f(\mathbf{a}) = \begin{bmatrix} \partial f/\partial x_1(\mathbf{a}) \\ \vdots \\ \partial f/\partial x_n(\mathbf{a}) \end{bmatrix}

The gradient has a fundamental geometric property: it points in the direction of steepest increase of ff at a\mathbf{a}. More precisely, among all unit directions, the one maximizing Duf(a)=f(a)uD_\mathbf{u} f(\mathbf{a}) = \nabla f(\mathbf{a}) \cdot \mathbf{u} is u=f(a)/f(a)\mathbf{u} = \nabla f(\mathbf{a})/\|\nabla f(\mathbf{a})\|, by Cauchy-Schwarz.

The gradient is also perpendicular to level sets of ff: if f(x)=cf(\mathbf{x}) = c defines a level surface, then f\nabla f is normal to that surface. This is because moving along the level surface gives Duf=0D_\mathbf{u} f = 0, which means uf\mathbf{u} \perp \nabla f.

Formal View

Definition 12.2 — Gradient Vector
The gradient of f:DRnRf: D \subseteq \mathbb{R}^n \to \mathbb{R} at a\mathbf{a} is the column vector
f(a)=[f/x1(a)f/xn(a)]=(Df(a))T\nabla f(\mathbf{a}) = \begin{bmatrix}\partial f/\partial x_1(\mathbf{a}) \\ \vdots \\ \partial f/\partial x_n(\mathbf{a})\end{bmatrix} = (Df(\mathbf{a}))^T
Theorem 12.5 — Gradient Points Toward Steepest Ascent
For differentiable ff with f(a)0\nabla f(\mathbf{a}) \neq \mathbf{0}:
maxu=1Duf(a)=f(a)\max_{\|\mathbf{u}\|=1} D_\mathbf{u} f(\mathbf{a}) = \|\nabla f(\mathbf{a})\|
achieved at u=f(a)/f(a)\mathbf{u}^* = \nabla f(\mathbf{a})/\|\nabla f(\mathbf{a})\|.

The steepest descent direction is f(a)/f(a)-\nabla f(\mathbf{a})/\|\nabla f(\mathbf{a})\|.

Why This Matters

The gradient is arguably the most important object in applied mathematics, underpinning virtually all optimization methods.

  • Machine learning: backpropagation computes gradients of loss functions
  • Physics: electric field is the negative gradient of potential; force is negative gradient of energy
  • Computer graphics: gradient of a signed distance function gives surface normals

Quiz

Question 1

The gradient f(a)\nabla f(\mathbf{a}) is perpendicular to:

Question 2

The maximum rate of increase of ff at a\mathbf{a} is:

Common Mistakes

  • Confusing the gradient (column vector) with the Jacobian row vector — they are transposes of each other.
  • Thinking the gradient points toward the nearest maximum — it gives the locally steepest direction, not global direction.
  • Forgetting that the gradient is zero at critical points, where the steepest-ascent interpretation breaks down.