7.912 min read

The Normal Equations

The normal equations $A^T A \hat{x} = A^T b$ transform the impossible overdetermined system $Ax = b$ into a square, solvable system. The name "normal" comes from the geometric fact: the residual $b - A\hat{x}$ is normal (perpendicular) to the column space of $A$ .

To verify: if $\hat{x}$ solves $A^T A \hat{x} = A^T b$ , then $A^T(b - A\hat{x}) = 0$ , meaning $(b - A\hat{x}) \perp \text{col}(A)$ . This is exactly the condition for $A\hat{x}$ to be the orthogonal projection of $b$ onto $\text{col}(A)$ .

To solve the normal equations in practice, you can apply Gaussian elimination directly to the $(n \times n)$ system $A^T A \hat{x} = A^T b$ . However, $A^T A$ can be ill-conditioned (near-singular) when columns of $A$ are nearly linearly dependent, making QR factorization (section 7.10) the preferred numerical method.

The hat matrix $H = A(A^T A)^{-1} A^T$ projects any vector $b$ onto $\text{col}(A)$ : $\hat{b} = Hb$ . Note that $H^2 = H$ (it is idempotent) and $H = H^T$ (it is symmetric).

Formal View

Theorem 7.9 — Normal Equations

Let

A \in \mathbb{R}^{m \times n}

with

m \geq n

. The vector

\hat{x}

minimizes

\|b - Ax\|

if and only if it satisfies

A^T A \hat{x} = A^T b.

A

has full column rank, then

A^T A

is symmetric positive definite, so this system has a unique solution

\hat{x} = (A^T A)^{-1} A^T b

The matrix $A^T A$ is always symmetric and positive semidefinite. It is positive definite (hence invertible) if and only if $A$ has full column rank.

Definition 7.9 — Projection Matrix

The orthogonal projector onto

\text{col}(A)

H = A(A^T A)^{-1} A^T.

It satisfies

H^2 = H

and

H^T = H

. The projected vector is

\hat{b} = Hb = A\hat{x}

, and the residual

b - \hat{b}

lies in

\text{col}(A)^\perp = \text{null}(A^T)

Why This Matters

The normal equations connect linear algebra to statistics: solving them is exactly ordinary least squares (OLS) regression.

Linear regression: the OLS estimator $\hat{\beta} = (X^T X)^{-1} X^T y$ is the normal equations solution
Gram matrices: $A^T A$ appears in kernel methods and covariance estimation
Weighted least squares: replace $A^T A$ with $A^T W A$ for non-uniform measurement noise
Control theory: projection plays a role in state estimation (Kalman filter)

Learning Resources

Normal Equations and Projection

MIT OpenCourseWare

Strang covers the normal equations and the projection matrix in detail.

48 min

Least Squares via Normal Equations

Professor Leonard

Step-by-step worked examples solving normal equations by hand.

30 min

Quiz

Question 1

The normal equations $A^T A \hat{x} = A^T b$ have a unique solution when:

Question 2

The matrix $A^T A$ is always symmetric.

Question 3

After solving the normal equations for $\hat{x}$ , the residual $b - A\hat{x}$ is:

Common Mistakes

Solving $Ax = b$ directly instead of $A^T A x = A^T b$ — the original system usually has no solution when $m > n$ .
Assuming $A^T A$ is always invertible — it fails when columns of $A$ are linearly dependent (rank deficient).
Confusing the projection $H = A(A^T A)^{-1} A^T$ with the identity — $H$ only fixes vectors in $\text{col}(A)$ , zeroing the perpendicular component.