6.910 min read

Best Approximation and Orthogonal Matrices

The orthogonal projection $\text{Proj}_V(\mathbf{b})$ is not just any vector in $V$ — it is the closest vector in $V$ to $\mathbf{b}$ . This is the Best Approximation Theorem. The proof uses the Pythagorean theorem: for any other $\mathbf{v}' \in V$ , the vector $\mathbf{b} - \mathbf{v}'$ has a right-triangle relationship to $\mathbf{b} - \text{Proj}_V(\mathbf{b})$ , making the hypotenuse strictly longer.

When $U$ is a square $m \times m$ matrix with orthonormal columns, it is called an orthogonal matrix. In this case, $U^t U = I$ and $U U^t = I$ — the transpose is both the left and right inverse: $U^{-1} = U^t$ .

Orthogonal matrices preserve lengths, distances, and angles. In $\mathbb{R}^2$ they are exactly rotations and reflections. In $\mathbb{R}^3$ they are rotations, reflections, and compositions thereof. Every rigid motion of space is represented by an orthogonal matrix.

Formal View

Theorem 6.16 — Best Approximation Theorem

Let

V

be a subspace of

\mathbb{R}^m

and

\mathbf{b} \in \mathbb{R}^m

. The orthogonal projection

\mathbf{v} = \text{Proj}_V(\mathbf{b})

is the unique vector in

V

minimizing distance to

\mathbf{b}

\|\mathbf{b} - \mathbf{v}\| < \|\mathbf{b} - \mathbf{v}'\| \quad \text{for all } \mathbf{v}' \in V,\, \mathbf{v}' \neq \mathbf{v}.

Proof sketch: $\mathbf{b} - \mathbf{v}'= (\mathbf{b} - \mathbf{v}) + (\mathbf{v} - \mathbf{v}')$ . Since $\mathbf{b} - \mathbf{v} \in V^\perp$ and $\mathbf{v} - \mathbf{v}' \in V$ , the Pythagorean theorem gives $\|\mathbf{b}-\mathbf{v}'\|^2 = \|\mathbf{b}-\mathbf{v}\|^2 + \|\mathbf{v}-\mathbf{v}'\|^2 > \|\mathbf{b}-\mathbf{v}\|^2$ .

Definition 6.17 — Orthogonal Matrix

A square

m \times m

matrix

U

is orthogonal if its columns form an orthonormal basis for

\mathbb{R}^m

. Equivalently,

U^t U = U U^t = I_m

, so

U^{-1} = U^t

Theorem 6.18

U

is an orthogonal matrix, then for all

\mathbf{u}, \mathbf{v} \in \mathbb{R}^m

\|U\mathbf{v}\| = \|\mathbf{v}\|, \quad \text{Dist}(U\mathbf{u}, U\mathbf{v}) = \text{Dist}(\mathbf{u}, \mathbf{v}), \quad (U\mathbf{u}) \cdot (U\mathbf{v}) = \mathbf{u} \cdot \mathbf{v}.

Orthogonal matrices are exactly the length-preserving (isometric) linear maps.

Interactive Visualization

Orthogonal Projection

Why This Matters

The best approximation theorem and orthogonal matrices together are the mathematical engine behind least squares, PCA, and all of modern data science.

Ordinary least squares regression finds the projection of the response vector onto the column space of the design matrix — the best linear approximation
Rotation matrices in robotics and 3D graphics are orthogonal — they guarantee no distortion of the object being transformed
The QR decomposition writes any invertible matrix as $A = QR$ , where $Q$ is orthogonal — numerically stable and used in all modern eigenvalue algorithms
Singular value decomposition expresses every matrix as $A = U \Sigma V^t$ where $U$ and $V$ are orthogonal — the deepest factorization in linear algebra

Learning Resources

Least squares approximations

MIT OpenCourseWare

Strang on least squares, normal equations, and the best approximation theorem.

49 min

Orthogonal matrices and Gram-Schmidt

MIT OpenCourseWare

The QR factorization and orthogonal matrices as the tool for stable computation.

49 min

Quiz

Question 1

For an orthogonal matrix $U$ , we have $U^{-1} = U^t$ .

Question 2

If $P = UU^t$ is a projection matrix onto $V$ and $\mathbf{b} \notin V$ , which statement is true?

Question 3

Every orthogonal matrix in $\mathbb{R}^{2 \times 2}$ represents either a rotation or a reflection.

Common Mistakes

Thinking $P = UU^t$ gives the identity when $U$ is not square — $UU^t = I$ requires $U$ to be a square orthogonal matrix.
Applying the best approximation result to a non-orthonormal $U$ — the formula $\text{Proj}_V(\mathbf{b}) = UU^t\mathbf{b}$ is ONLY valid when $U$ has orthonormal columns.
Forgetting that orthogonal matrices preserve the dot product — $(U\mathbf{u}) \cdot (U\mathbf{v}) = \mathbf{u} \cdot \mathbf{v}$ , so angles and lengths are unchanged.