Linear Algebra
6.910 min read

Best Approximation and Orthogonal Matrices

The orthogonal projection ProjV(b)\text{Proj}_V(\mathbf{b}) is not just any vector in VV — it is the closest vector in VV to b\mathbf{b}. This is the Best Approximation Theorem. The proof uses the Pythagorean theorem: for any other vV\mathbf{v}' \in V, the vector bv\mathbf{b} - \mathbf{v}' has a right-triangle relationship to bProjV(b)\mathbf{b} - \text{Proj}_V(\mathbf{b}), making the hypotenuse strictly longer.

When UU is a square m×mm \times m matrix with orthonormal columns, it is called an orthogonal matrix. In this case, UtU=IU^t U = I and UUt=IU U^t = I — the transpose is both the left and right inverse: U1=UtU^{-1} = U^t.

Orthogonal matrices preserve lengths, distances, and angles. In R2\mathbb{R}^2 they are exactly rotations and reflections. In R3\mathbb{R}^3 they are rotations, reflections, and compositions thereof. Every rigid motion of space is represented by an orthogonal matrix.

Formal View

Theorem 6.16 — Best Approximation Theorem
Let VV be a subspace of Rm\mathbb{R}^m and bRm\mathbf{b} \in \mathbb{R}^m. The orthogonal projection v=ProjV(b)\mathbf{v} = \text{Proj}_V(\mathbf{b}) is the unique vector in VV minimizing distance to b\mathbf{b}:
bv<bvfor all vV,vv.\|\mathbf{b} - \mathbf{v}\| < \|\mathbf{b} - \mathbf{v}'\| \quad \text{for all } \mathbf{v}' \in V,\, \mathbf{v}' \neq \mathbf{v}.

Proof sketch: bv=(bv)+(vv)\mathbf{b} - \mathbf{v}'= (\mathbf{b} - \mathbf{v}) + (\mathbf{v} - \mathbf{v}'). Since bvV\mathbf{b} - \mathbf{v} \in V^\perp and vvV\mathbf{v} - \mathbf{v}' \in V, the Pythagorean theorem gives bv2=bv2+vv2>bv2\|\mathbf{b}-\mathbf{v}'\|^2 = \|\mathbf{b}-\mathbf{v}\|^2 + \|\mathbf{v}-\mathbf{v}'\|^2 > \|\mathbf{b}-\mathbf{v}\|^2.

Definition 6.17 — Orthogonal Matrix
A square m×mm \times m matrix UU is orthogonal if its columns form an orthonormal basis for Rm\mathbb{R}^m. Equivalently, UtU=UUt=ImU^t U = U U^t = I_m, so U1=UtU^{-1} = U^t.
Theorem 6.18
If UU is an orthogonal matrix, then for all u,vRm\mathbf{u}, \mathbf{v} \in \mathbb{R}^m:
Uv=v,Dist(Uu,Uv)=Dist(u,v),(Uu)(Uv)=uv.\|U\mathbf{v}\| = \|\mathbf{v}\|, \quad \text{Dist}(U\mathbf{u}, U\mathbf{v}) = \text{Dist}(\mathbf{u}, \mathbf{v}), \quad (U\mathbf{u}) \cdot (U\mathbf{v}) = \mathbf{u} \cdot \mathbf{v}.
Orthogonal matrices are exactly the length-preserving (isometric) linear maps.

Interactive Visualization

Orthogonal Projection

Why This Matters

The best approximation theorem and orthogonal matrices together are the mathematical engine behind least squares, PCA, and all of modern data science.

  • Ordinary least squares regression finds the projection of the response vector onto the column space of the design matrix — the best linear approximation
  • Rotation matrices in robotics and 3D graphics are orthogonal — they guarantee no distortion of the object being transformed
  • The QR decomposition writes any invertible matrix as A=QRA = QR, where QQ is orthogonal — numerically stable and used in all modern eigenvalue algorithms
  • Singular value decomposition expresses every matrix as A=UΣVtA = U \Sigma V^t where UU and VV are orthogonal — the deepest factorization in linear algebra

Quiz

Question 1

For an orthogonal matrix UU, we have U1=UtU^{-1} = U^t.

Question 2

If P=UUtP = UU^t is a projection matrix onto VV and bV\mathbf{b} \notin V, which statement is true?

Question 3

Every orthogonal matrix in R2×2\mathbb{R}^{2 \times 2} represents either a rotation or a reflection.

Common Mistakes

  • Thinking P=UUtP = UU^t gives the identity when UU is not square — UUt=IUU^t = I requires UU to be a square orthogonal matrix.
  • Applying the best approximation result to a non-orthonormal UU — the formula ProjV(b)=UUtb\text{Proj}_V(\mathbf{b}) = UU^t\mathbf{b} is ONLY valid when UU has orthonormal columns.
  • Forgetting that orthogonal matrices preserve the dot product — (Uu)(Uv)=uv(U\mathbf{u}) \cdot (U\mathbf{v}) = \mathbf{u} \cdot \mathbf{v}, so angles and lengths are unchanged.