Linear Algebra
7.912 min read

The Normal Equations

The normal equations ATAx^=ATbA^T A \hat{x} = A^T b transform the impossible overdetermined system Ax=bAx = b into a square, solvable system. The name "normal" comes from the geometric fact: the residual bAx^b - A\hat{x} is normal (perpendicular) to the column space of AA.

To verify: if x^\hat{x} solves ATAx^=ATbA^T A \hat{x} = A^T b, then AT(bAx^)=0A^T(b - A\hat{x}) = 0, meaning (bAx^)col(A)(b - A\hat{x}) \perp \text{col}(A). This is exactly the condition for Ax^A\hat{x} to be the orthogonal projection of bb onto col(A)\text{col}(A).

To solve the normal equations in practice, you can apply Gaussian elimination directly to the (n×n)(n \times n) system ATAx^=ATbA^T A \hat{x} = A^T b. However, ATAA^T A can be ill-conditioned (near-singular) when columns of AA are nearly linearly dependent, making QR factorization (section 7.10) the preferred numerical method.

The hat matrix H=A(ATA)1ATH = A(A^T A)^{-1} A^T projects any vector bb onto col(A)\text{col}(A): b^=Hb\hat{b} = Hb. Note that H2=HH^2 = H (it is idempotent) and H=HTH = H^T (it is symmetric).

Formal View

Theorem 7.9 — Normal Equations
Let ARm×nA \in \mathbb{R}^{m \times n} with mnm \geq n. The vector x^\hat{x} minimizes bAx\|b - Ax\| if and only if it satisfies
ATAx^=ATb.A^T A \hat{x} = A^T b.
If AA has full column rank, then ATAA^T A is symmetric positive definite, so this system has a unique solution x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b.

The matrix ATAA^T A is always symmetric and positive semidefinite. It is positive definite (hence invertible) if and only if AA has full column rank.

Definition 7.9 — Projection Matrix
The orthogonal projector onto col(A)\text{col}(A) is
H=A(ATA)1AT.H = A(A^T A)^{-1} A^T.
It satisfies H2=HH^2 = H and HT=HH^T = H. The projected vector is b^=Hb=Ax^\hat{b} = Hb = A\hat{x}, and the residual bb^b - \hat{b} lies in col(A)=null(AT)\text{col}(A)^\perp = \text{null}(A^T).

Why This Matters

The normal equations connect linear algebra to statistics: solving them is exactly ordinary least squares (OLS) regression.

  • Linear regression: the OLS estimator β^=(XTX)1XTy\hat{\beta} = (X^T X)^{-1} X^T y is the normal equations solution
  • Gram matrices: ATAA^T A appears in kernel methods and covariance estimation
  • Weighted least squares: replace ATAA^T A with ATWAA^T W A for non-uniform measurement noise
  • Control theory: projection plays a role in state estimation (Kalman filter)

Quiz

Question 1

The normal equations ATAx^=ATbA^T A \hat{x} = A^T b have a unique solution when:

Question 2

The matrix ATAA^T A is always symmetric.

Question 3

After solving the normal equations for x^\hat{x}, the residual bAx^b - A\hat{x} is:

Common Mistakes

  • Solving Ax=bAx = b directly instead of ATAx=ATbA^T A x = A^T b — the original system usually has no solution when m>nm > n.
  • Assuming ATAA^T A is always invertible — it fails when columns of AA are linearly dependent (rank deficient).
  • Confusing the projection H=A(ATA)1ATH = A(A^T A)^{-1} A^T with the identity — HH only fixes vectors in col(A)\text{col}(A), zeroing the perpendicular component.