Linear Algebra
12.188 min read

Strategies for Finding Global Minima

When a function may not be convex, finding the global minimum requires more sophisticated strategies. The main analytical approaches are: (1) find all critical points and compare their values, selecting the smallest; (2) check the boundary if the domain is bounded; (3) exploit problem structure (e.g., separability, periodicity) to reduce the search.

For practical problems, commonly used strategies include: random restarts (run gradient descent from many random starting points), simulated annealing (probabilistic jumps to escape local minima), evolutionary/genetic algorithms (population-based global search), and convex relaxation (replace the non-convex problem with a tractable convex approximation).

In machine learning, the loss landscape typically has many local minima of similar quality, so gradient descent often finds a good (if not globally optimal) solution. Stochastic gradient descent (using random subsets of data) adds noise that helps escape shallow local minima.

Formal View

Remark 12.1 — Global Optimization Hierarchy
Finding global minima is NP-hard in general. Tractable special cases: 1. Convex functions on convex domains 2. Quadratic with positive definite Hessian 3. Unimodal functions (one local min = global min) 4. Separable functions f(x)=ifi(xi)f(\mathbf{x}) = \sum_i f_i(x_i)

Why This Matters

Choosing the right global optimization strategy is crucial for practical problem solving.

  • Hyperparameter tuning in machine learning uses random search and Bayesian optimization
  • Molecular dynamics simulations use simulated annealing to find ground states
  • Operations research uses branch-and-bound for combinatorial optimization

Quiz

Question 1

For a non-convex function with multiple local minima, what is the most rigorous analytical approach to finding the global minimum?

Question 2

Stochastic gradient descent is guaranteed to find the global minimum of a non-convex function.

Common Mistakes

  • Trusting the first local minimum found by gradient descent as the global minimum for non-convex problems.
  • Forgetting to evaluate the function at the boundary when the domain is compact.
  • Over-applying random restarts without considering whether the problem has structure that can be exploited.