12.188 min read

Strategies for Finding Global Minima

When a function may not be convex, finding the global minimum requires more sophisticated strategies. The main analytical approaches are: (1) find all critical points and compare their values, selecting the smallest; (2) check the boundary if the domain is bounded; (3) exploit problem structure (e.g., separability, periodicity) to reduce the search.

For practical problems, commonly used strategies include: random restarts (run gradient descent from many random starting points), simulated annealing (probabilistic jumps to escape local minima), evolutionary/genetic algorithms (population-based global search), and convex relaxation (replace the non-convex problem with a tractable convex approximation).

In machine learning, the loss landscape typically has many local minima of similar quality, so gradient descent often finds a good (if not globally optimal) solution. Stochastic gradient descent (using random subsets of data) adds noise that helps escape shallow local minima.

Formal View

Remark 12.1 — Global Optimization Hierarchy

Finding global minima is NP-hard in general. Tractable special cases: 1. Convex functions on convex domains 2. Quadratic with positive definite Hessian 3. Unimodal functions (one local min = global min) 4. Separable functions

f(\mathbf{x}) = \sum_i f_i(x_i)

Why This Matters

Choosing the right global optimization strategy is crucial for practical problem solving.

Hyperparameter tuning in machine learning uses random search and Bayesian optimization
Molecular dynamics simulations use simulated annealing to find ground states
Operations research uses branch-and-bound for combinatorial optimization

Learning Resources

Global Optimization Methods

MIT OpenCourseWare

Overview of global optimization strategies and their theoretical guarantees.

50 min

Stochastic Gradient Descent

StatQuest

How stochastic gradient descent helps with non-convex optimization in practice.

12 min

Quiz

Question 1

For a non-convex function with multiple local minima, what is the most rigorous analytical approach to finding the global minimum?

Question 2

Stochastic gradient descent is guaranteed to find the global minimum of a non-convex function.

Common Mistakes

Trusting the first local minimum found by gradient descent as the global minimum for non-convex problems.
Forgetting to evaluate the function at the boundary when the domain is compact.
Over-applying random restarts without considering whether the problem has structure that can be exploited.