12.1610 min read
Least Squares as Optimization
The least squares problem — minimize over — is a prime example of quadratic optimization. Expanding: .
The matrix is always positive semidefinite (PSD). Its gradient is . Setting to zero gives the normal equations: .
When has full column rank, is positive definite (invertible), and the unique minimizer is . When does not have full column rank, is singular, and we use the pseudoinverse or add regularization.
Formal View
Theorem 12.10 — Least Squares Solution
The minimum of is achieved at solutions of the normal equations . When has full column rank, the unique minimizer is .
The normal equations always have at least one solution since .
Interactive Visualization
Matrix-Vector Multiplication
Why This Matters
Least squares optimization unifies statistics, signal processing, and data science under a single mathematical framework.
- Linear regression: fitting a line/plane to data by minimizing sum of squared residuals
- Signal reconstruction: finding the best low-noise estimate from noisy measurements
- Curve fitting: approximating experimental data with polynomial or nonlinear models
Quiz
Question 1
The normal equations for are:
Question 2
The matrix is always positive definite, regardless of .
Common Mistakes
- Confusing the normal equations with .
- Assuming is always invertible — it is only invertible when has full column rank.
- Forgetting that the least squares solution minimizes the squared error, not the error itself.