12.77 min read
Steepest Descent Direction
For minimization, we want to decrease as rapidly as possible. The direction of steepest descent is — exactly opposite to the gradient.
Moving a small step in the steepest descent direction gives: . This decrease is guaranteed for sufficiently small (as long as the gradient is nonzero).
This is the mathematical justification for gradient descent: at each step, update . The parameter is called the step size or learning rate. Too small: slow convergence. Too large: may overshoot and diverge.
Formal View
Definition 12.3 — Steepest Descent Direction
The steepest descent direction at (where ) is
This is the unit vector minimizing over all unit vectors .
Why This Matters
Steepest descent is the simplest effective optimization algorithm and forms the core of deep learning training.
- Gradient descent for training neural networks (the negative gradient gives the training signal)
- Physics-inspired methods: gradient flow equations
- Image denoising: iteratively moving in the gradient direction of a regularization functional
Quiz
Question 1
Moving one step in the direction (not normalized) always decreases .
Question 2
In gradient descent, the update rule is . What is ?
Common Mistakes
- Moving in the gradient direction (ascent) instead of the negative gradient (descent) when minimizing.
- Choosing a learning rate that is too large, causing divergence.
- Confusing steepest descent (a direction) with gradient descent (an algorithm/iteration).