12.1610 min read

Least Squares as Optimization

The least squares problem — minimize $f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2$ over $\mathbf{x} \in \mathbb{R}^n$ — is a prime example of quadratic optimization. Expanding: $f(\mathbf{x}) = \mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2$ .

The matrix $A^T A$ is always positive semidefinite (PSD). Its gradient is $\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b}$ . Setting to zero gives the normal equations: $A^T A \mathbf{x} = A^T \mathbf{b}$ .

When $A$ has full column rank, $A^T A$ is positive definite (invertible), and the unique minimizer is $\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}$ . When $A$ does not have full column rank, $A^T A$ is singular, and we use the pseudoinverse or add regularization.

Formal View

Theorem 12.10 — Least Squares Solution

The minimum of

f(\mathbf{x}) = \|A\mathbf{x}-\mathbf{b}\|^2

is achieved at solutions of the normal equations

A^T A \mathbf{x} = A^T \mathbf{b}

. When

A

has full column rank, the unique minimizer is

\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}

The normal equations always have at least one solution since $A^T\mathbf{b} \in \text{col}(A^T A) = \text{col}(A^T)$ .

Interactive Visualization

Matrix-Vector Multiplication

Why This Matters

Least squares optimization unifies statistics, signal processing, and data science under a single mathematical framework.

Linear regression: fitting a line/plane to data by minimizing sum of squared residuals
Signal reconstruction: finding the best low-noise estimate from noisy measurements
Curve fitting: approximating experimental data with polynomial or nonlinear models

Learning Resources

Least Squares Optimization

MIT OpenCourseWare

Gilbert Strang on least squares from the optimization perspective.

48 min

Normal Equations Derivation

Steve Brunton

Deriving the normal equations from the gradient of the least squares objective.

14 min

Quiz

Question 1

The normal equations for $\min_\mathbf{x} \|A\mathbf{x}-\mathbf{b}\|^2$ are:

Question 2

The matrix $A^T A$ is always positive definite, regardless of $A$ .

Common Mistakes

Confusing the normal equations $A^T A \mathbf{x} = A^T \mathbf{b}$ with $A\mathbf{x} = \mathbf{b}$ .
Assuming $A^T A$ is always invertible — it is only invertible when $A$ has full column rank.
Forgetting that the least squares solution minimizes the squared error, not the error itself.