Gradient Descent Algorithm
Gradient descent is an iterative algorithm for finding a local minimum of a differentiable function . Starting from an initial point , it repeatedly takes steps in the direction of steepest descent:
The step size (learning rate) can be fixed or adapted at each step. Common strategies include: (1) fixed step size: for all ; (2) line search: choose ; (3) diminishing step sizes: at an appropriate rate.
Gradient descent converges to a critical point (where ) under mild conditions. For convex functions, any critical point is a global minimum.
Formal View
This is an convergence rate. Strongly convex functions give exponential convergence.
Interactive Visualization
Gradient Descent Visualizer
Why This Matters
Gradient descent (and its variants) is the engine behind virtually all modern machine learning.
- Training neural networks via stochastic gradient descent (SGD)
- Solving large-scale linear regression, logistic regression, and SVMs
- Scientific computing: solving variational problems and PDE-constrained optimization
Quiz
Gradient descent terminates when:
For a non-convex function, gradient descent is guaranteed to find the global minimum.
Common Mistakes
- Setting the learning rate too large, causing oscillation or divergence.
- Stopping too early (gradient not small enough) and returning a point far from the minimum.
- Assuming gradient descent finds the global minimum for non-convex objectives.