Linear Algebra
12.128 min read

Saddle Points

A saddle point is a critical point that is neither a local minimum nor a local maximum. At a saddle point, the function increases in some directions and decreases in others, making it a minimax point.

The canonical example is f(x,y)=x2y2f(x,y) = x^2 - y^2 at the origin. The gradient is f=(2x,2y)\nabla f = (2x, -2y), which is zero at (0,0)(0,0). But ff increases along the xx-axis (moving right) and decreases along the yy-axis (moving up). The surface looks like a horse saddle — hence the name.

In high dimensions, saddle points become increasingly prevalent. For neural networks with many parameters, most critical points are saddle points rather than local minima. Gradient descent can slow near saddle points (the gradient is small but not zero), which is a practical challenge in deep learning.

Formal View

Definition 12.6 — Saddle Point
A critical point a\mathbf{a} of ff (where f(a)=0\nabla f(\mathbf{a}) = \mathbf{0}) is a saddle point if it is neither a local minimum nor a local maximum. In particular, every neighborhood of a\mathbf{a} contains points where f>f(a)f > f(\mathbf{a}) and points where f<f(a)f < f(\mathbf{a}).
Example 12.1 — Saddle Point of $x^2 - y^2$
For f(x,y)=x2y2f(x,y) = x^2 - y^2: f=(2x,2y)T=0\nabla f = (2x, -2y)^T = \mathbf{0} only at (0,0)(0,0). Along the xx-axis: f(h,0)=h2>0=f(0,0)f(h,0) = h^2 > 0 = f(0,0). Along the yy-axis: f(0,k)=k2<0=f(0,0)f(0,k) = -k^2 < 0 = f(0,0). So (0,0)(0,0) is a saddle point.

Why This Matters

Saddle points are ubiquitous in machine learning optimization and can dramatically slow training.

  • Deep learning optimization landscapes are dominated by saddle points in high dimensions
  • Game theory: Nash equilibria in zero-sum games are saddle points of the payoff function
  • Minimax optimization (GANs) deliberately seeks saddle points

Quiz

Question 1

Which of the following best describes a saddle point?

Question 2

At a saddle point, the gradient is zero.

Common Mistakes

  • Confusing saddle points with inflection points (a one-variable concept).
  • Thinking all non-minimum critical points are local maxima — saddle points are the third option.
  • Not recognizing saddle points in gradient descent — the gradient may become very small near a saddle point, causing apparent convergence.