Linear Algebra
10.18 min read

Optimization Problems

An optimization problem asks: find a vector xRn\mathbf{x}^* \in \mathbb{R}^n that minimizes (or maximizes) a scalar-valued function f(x)f(\mathbf{x}). In machine learning, ff might be the training loss; in engineering, the potential energy; in finance, the portfolio risk. The goal is always to find the input that makes the output as small as possible.

Why is this hard? For a general function ff, the minimizer x\mathbf{x}^* could be anywhere — we have no idea where to look. Calculus provides the tools to narrow the search: derivatives tell us which direction is "downhill," critical points are where the gradient is zero, and second-order conditions distinguish minima from maxima and saddle points.

The iterative strategy called gradient descent starts from an initial guess and repeatedly steps in the direction of steepest descent. Understanding this algorithm deeply requires understanding derivatives — which is the subject of this and the next several chapters.

Formal View

Definition 10.1 — Optimization Problem
Given a function f:RnRf: \mathbb{R}^n \to \mathbb{R}, the minimization problem is
x=argminxRnf(x).\mathbf{x}^* = \arg\min_{\mathbf{x} \in \mathbb{R}^n} f(\mathbf{x}).
A point x\mathbf{x}^* is a global minimum if f(x)f(x)f(\mathbf{x}^*) \leq f(\mathbf{x}) for all x\mathbf{x}, and a local minimum if f(x)f(x)f(\mathbf{x}^*) \leq f(\mathbf{x}) for all x\mathbf{x} near x\mathbf{x}^*.

Why This Matters

Optimization is the engine of machine learning, operations research, and engineering design.

  • Neural network training: minimize the loss function L(w)\mathcal{L}(\mathbf{w}) over billions of parameters w\mathbf{w}.
  • Supply chain: minimize cost subject to delivery constraints.
  • Antenna design: maximize signal strength subject to power constraints.

Quiz

Question 1

A local minimum of ff is always a global minimum.

Question 2

Gradient descent finds a minimum by:

Common Mistakes

  • Confusing local and global minima — gradient descent typically finds local minima, which may not be global.
  • Assuming every function has a global minimum — unbounded or oscillating functions may not achieve their infimum.