The Normal Equations
The normal equations transform the impossible overdetermined system into a square, solvable system. The name "normal" comes from the geometric fact: the residual is normal (perpendicular) to the column space of .
To verify: if solves , then , meaning . This is exactly the condition for to be the orthogonal projection of onto .
To solve the normal equations in practice, you can apply Gaussian elimination directly to the system . However, can be ill-conditioned (near-singular) when columns of are nearly linearly dependent, making QR factorization (section 7.10) the preferred numerical method.
The hat matrix projects any vector onto : . Note that (it is idempotent) and (it is symmetric).
Formal View
The matrix is always symmetric and positive semidefinite. It is positive definite (hence invertible) if and only if has full column rank.
Why This Matters
The normal equations connect linear algebra to statistics: solving them is exactly ordinary least squares (OLS) regression.
- Linear regression: the OLS estimator is the normal equations solution
- Gram matrices: appears in kernel methods and covariance estimation
- Weighted least squares: replace with for non-uniform measurement noise
- Control theory: projection plays a role in state estimation (Kalman filter)
Quiz
The normal equations have a unique solution when:
The matrix is always symmetric.
After solving the normal equations for , the residual is:
Common Mistakes
- Solving directly instead of — the original system usually has no solution when .
- Assuming is always invertible — it fails when columns of are linearly dependent (rank deficient).
- Confusing the projection with the identity — only fixes vectors in , zeroing the perpendicular component.