10.18 min read

Optimization Problems

An optimization problem asks: find a vector $\mathbf{x}^* \in \mathbb{R}^n$ that minimizes (or maximizes) a scalar-valued function $f(\mathbf{x})$ . In machine learning, $f$ might be the training loss; in engineering, the potential energy; in finance, the portfolio risk. The goal is always to find the input that makes the output as small as possible.

Why is this hard? For a general function $f$ , the minimizer $\mathbf{x}^*$ could be anywhere — we have no idea where to look. Calculus provides the tools to narrow the search: derivatives tell us which direction is "downhill," critical points are where the gradient is zero, and second-order conditions distinguish minima from maxima and saddle points.

The iterative strategy called gradient descent starts from an initial guess and repeatedly steps in the direction of steepest descent. Understanding this algorithm deeply requires understanding derivatives — which is the subject of this and the next several chapters.

Formal View

Definition 10.1 — Optimization Problem

Given a function

f: \mathbb{R}^n \to \mathbb{R}

, the minimization problem is

\mathbf{x}^* = \arg\min_{\mathbf{x} \in \mathbb{R}^n} f(\mathbf{x}).

A point

\mathbf{x}^*

is a global minimum if

f(\mathbf{x}^*) \leq f(\mathbf{x})

for all

\mathbf{x}

, and a local minimum if

f(\mathbf{x}^*) \leq f(\mathbf{x})

for all

\mathbf{x}

near

\mathbf{x}^*

Why This Matters

Optimization is the engine of machine learning, operations research, and engineering design.

Neural network training: minimize the loss function $\mathcal{L}(\mathbf{w})$ over billions of parameters $\mathbf{w}$ .
Supply chain: minimize cost subject to delivery constraints.
Antenna design: maximize signal strength subject to power constraints.

Learning Resources

Gradient descent — how neural networks learn

3Blue1Brown

Intuitive explanation of optimization and gradient descent in neural networks.

21 min

Optimization problems in calculus

Khan Academy

Introduction to optimization: finding minima and maxima.

10 min

Quiz

Question 1

A local minimum of $f$ is always a global minimum.

Question 2

Gradient descent finds a minimum by:

Common Mistakes

Confusing local and global minima — gradient descent typically finds local minima, which may not be global.
Assuming every function has a global minimum — unbounded or oscillating functions may not achieve their infimum.