12.128 min read

Saddle Points

A saddle point is a critical point that is neither a local minimum nor a local maximum. At a saddle point, the function increases in some directions and decreases in others, making it a minimax point.

The canonical example is $f(x,y) = x^2 - y^2$ at the origin. The gradient is $\nabla f = (2x, -2y)$ , which is zero at $(0,0)$ . But $f$ increases along the $x$ -axis (moving right) and decreases along the $y$ -axis (moving up). The surface looks like a horse saddle — hence the name.

In high dimensions, saddle points become increasingly prevalent. For neural networks with many parameters, most critical points are saddle points rather than local minima. Gradient descent can slow near saddle points (the gradient is small but not zero), which is a practical challenge in deep learning.

Formal View

Definition 12.6 — Saddle Point

A critical point

\mathbf{a}

f

(where

\nabla f(\mathbf{a}) = \mathbf{0}

) is a saddle point if it is neither a local minimum nor a local maximum. In particular, every neighborhood of

\mathbf{a}

contains points where

f > f(\mathbf{a})

and points where

f < f(\mathbf{a})

Example 12.1 — Saddle Point of $x^2 - y^2$

For

f(x,y) = x^2 - y^2

\nabla f = (2x, -2y)^T = \mathbf{0}

only at

(0,0)

. Along the

x

-axis:

f(h,0) = h^2 > 0 = f(0,0)

. Along the

y

-axis:

f(0,k) = -k^2 < 0 = f(0,0)

. So

(0,0)

is a saddle point.

Why This Matters

Saddle points are ubiquitous in machine learning optimization and can dramatically slow training.

Deep learning optimization landscapes are dominated by saddle points in high dimensions
Game theory: Nash equilibria in zero-sum games are saddle points of the payoff function
Minimax optimization (GANs) deliberately seeks saddle points

Learning Resources

Saddle Points in Multivariable Calculus

Khan Academy

Identifying and visualizing saddle points.

10 min

Saddle Points in Machine Learning

StatQuest

Why saddle points are important in neural network optimization.

15 min

Quiz

Question 1

Which of the following best describes a saddle point?

Question 2

At a saddle point, the gradient is zero.

Common Mistakes

Confusing saddle points with inflection points (a one-variable concept).
Thinking all non-minimum critical points are local maxima — saddle points are the third option.
Not recognizing saddle points in gradient descent — the gradient may become very small near a saddle point, causing apparent convergence.