Linear Algebra
12.1610 min read

Least Squares as Optimization

The least squares problem — minimize f(x)=Axb2f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2 over xRn\mathbf{x} \in \mathbb{R}^n — is a prime example of quadratic optimization. Expanding: f(x)=xTATAx2bTAx+b2f(\mathbf{x}) = \mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2.

The matrix ATAA^T A is always positive semidefinite (PSD). Its gradient is f(x)=2ATAx2ATb\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b}. Setting to zero gives the normal equations: ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b}.

When AA has full column rank, ATAA^T A is positive definite (invertible), and the unique minimizer is x^=(ATA)1ATb\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}. When AA does not have full column rank, ATAA^T A is singular, and we use the pseudoinverse or add regularization.

Formal View

Theorem 12.10 — Least Squares Solution
The minimum of f(x)=Axb2f(\mathbf{x}) = \|A\mathbf{x}-\mathbf{b}\|^2 is achieved at solutions of the normal equations ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b}. When AA has full column rank, the unique minimizer is x^=(ATA)1ATb\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}.

The normal equations always have at least one solution since ATbcol(ATA)=col(AT)A^T\mathbf{b} \in \text{col}(A^T A) = \text{col}(A^T).

Interactive Visualization

Matrix-Vector Multiplication

Why This Matters

Least squares optimization unifies statistics, signal processing, and data science under a single mathematical framework.

  • Linear regression: fitting a line/plane to data by minimizing sum of squared residuals
  • Signal reconstruction: finding the best low-noise estimate from noisy measurements
  • Curve fitting: approximating experimental data with polynomial or nonlinear models

Quiz

Question 1

The normal equations for minxAxb2\min_\mathbf{x} \|A\mathbf{x}-\mathbf{b}\|^2 are:

Question 2

The matrix ATAA^T A is always positive definite, regardless of AA.

Common Mistakes

  • Confusing the normal equations ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b} with Ax=bA\mathbf{x} = \mathbf{b}.
  • Assuming ATAA^T A is always invertible — it is only invertible when AA has full column rank.
  • Forgetting that the least squares solution minimizes the squared error, not the error itself.