Linear Algebra
7.812 min read

The Least Squares Problem

Most real-world systems are overdetermined: you have more equations than unknowns. A GPS receiver solves ~30 satellite equations for 4 unknowns; a data scientist fits a line through thousands of points. There is usually no exact solution Ax=bAx = b, so we ask: what xx makes AxAx as close as possible to bb?

We measure closeness with the Euclidean norm: the residual is r=bAxr = b - Ax, and its length r\|r\| is the error. The least squares problem asks us to minimize bAx2\|b - Ax\|^2 over all xRnx \in \mathbb{R}^n.

Geometrically, AxAx can only reach the column space of AA. The closest point in the column space to bb is the orthogonal projection b^=projcol(A)b\hat{b} = \text{proj}_{\text{col}(A)} b. The optimal x^\hat{x} satisfies Ax^=b^A\hat{x} = \hat{b}.

Since bAx^b - A\hat{x} must be perpendicular to every column of AA, we get the condition AT(bAx^)=0A^T(b - A\hat{x}) = 0, which rearranges to the normal equations ATAx^=ATbA^T A \hat{x} = A^T b. We will solve these in section 7.9.

Formal View

Definition 7.8 — Least Squares Solution
Given ARm×nA \in \mathbb{R}^{m \times n} with m>nm > n and bRmb \in \mathbb{R}^m, the least squares solution x^\hat{x} minimizes
bAx2=i=1m(bi(Ax)i)2.\|b - Ax\|^2 = \sum_{i=1}^m (b_i - (Ax)_i)^2.
It satisfies ATAx^=ATbA^T A \hat{x} = A^T b (the normal equations).
Theorem 7.8 — Existence and Uniqueness
A least squares solution always exists. It is unique if and only if AA has full column rank (i.e., the columns of AA are linearly independent), in which case ATAA^T A is invertible and x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b.

Why This Matters

Least squares is the backbone of data fitting and statistical estimation, appearing everywhere from GPS to machine learning.

  • Linear regression: fitting the best line or plane through data
  • Navigation systems: estimating position from overdetermined satellite equations
  • Signal processing: filtering and denoising noisy measurements
  • Computer vision: fitting geometric models to point clouds

Quiz

Question 1

The least squares solution x^\hat{x} minimizes which quantity?

Question 2

A least squares solution exists for every matrix AA and every vector bb.

Question 3

If ARm×nA \in \mathbb{R}^{m \times n} with m>nm > n has full column rank, the unique least squares solution is:

Common Mistakes

  • Trying to solve Ax=bAx = b directly when m>nm > n — this system is generally inconsistent; use the normal equations instead.
  • Forgetting that the normal equations require AA to have full column rank; if columns are dependent, ATAA^T A is singular.
  • Confusing minimizing bAx\|b - Ax\| with minimizing Axb2\|Ax - b\|^2 — they give the same answer, but squaring makes the algebra smoother.