Saddle Points
A saddle point is a critical point that is neither a local minimum nor a local maximum. At a saddle point, the function increases in some directions and decreases in others, making it a minimax point.
The canonical example is at the origin. The gradient is , which is zero at . But increases along the -axis (moving right) and decreases along the -axis (moving up). The surface looks like a horse saddle — hence the name.
In high dimensions, saddle points become increasingly prevalent. For neural networks with many parameters, most critical points are saddle points rather than local minima. Gradient descent can slow near saddle points (the gradient is small but not zero), which is a practical challenge in deep learning.
Formal View
Why This Matters
Saddle points are ubiquitous in machine learning optimization and can dramatically slow training.
- Deep learning optimization landscapes are dominated by saddle points in high dimensions
- Game theory: Nash equilibria in zero-sum games are saddle points of the payoff function
- Minimax optimization (GANs) deliberately seeks saddle points
Quiz
Which of the following best describes a saddle point?
At a saddle point, the gradient is zero.
Common Mistakes
- Confusing saddle points with inflection points (a one-variable concept).
- Thinking all non-minimum critical points are local maxima — saddle points are the third option.
- Not recognizing saddle points in gradient descent — the gradient may become very small near a saddle point, causing apparent convergence.