Linear Algebra
9.910 min read

The Affine Reduction Problem

The dimensionality reduction problem: given NN data points q1,,qNRm\mathbf{q}_1, \ldots, \mathbf{q}_N \in \mathbb{R}^m, find a kk-dimensional affine subspace closest to all points. Formally, minimize i=1NqimVkVk(qim)2\sum_{i=1}^N \|\mathbf{q}_i - \mathbf{m} - V_k V_k^\top(\mathbf{q}_i - \mathbf{m})\|^2 over the center m\mathbf{m} and orthonormal frame VkV_k.

The optimal center is always m=1Ni=1Nqi\mathbf{m}^* = \frac{1}{N}\sum_{i=1}^N \mathbf{q}_i (the data mean). After finding this, subtracting the mean turns the affine problem into a linear one: find the best linear subspace for the centered data.

Formal View

Definition 9.1 — Affine Dimensionality Reduction Problem
Given NN points qiRm\mathbf{q}_i \in \mathbb{R}^m and target dimension kmk \leq m, find mRm\mathbf{m} \in \mathbb{R}^m and VkRm×kV_k \in \mathbb{R}^{m \times k} with orthonormal columns minimizing i=1NqimVkVk(qim)2\sum_{i=1}^N \|\mathbf{q}_i - \mathbf{m} - V_k V_k^\top (\mathbf{q}_i - \mathbf{m})\|^2.

The optimal center is always m=1Niqi\mathbf{m}^* = \frac{1}{N}\sum_i \mathbf{q}_i (the mean).

Why This Matters

Dimensionality reduction is fundamental to data science — compressing data while preserving structure.

  • Face recognition: represent faces in a low-dimensional "face space".
  • Genome-wide association studies: reduce thousands of genetic markers to principal components.
  • Anomaly detection: points far from the low-dimensional subspace are outliers.

Quiz

Question 1

What is the optimal center m\mathbf{m}^* for the affine reduction problem?

Question 2

After subtracting the mean, the affine reduction problem becomes a linear reduction problem.

Common Mistakes

  • Skipping centering and fitting a linear subspace through the origin — gives suboptimal results unless data already has zero mean.
  • Confusing dimension reduction (finding a low-dim subspace) with feature selection (choosing original features).