Linear Algebra
9.198 min read

Visualizing Low-Dimensional Structure

The principal coordinates C=UkARk×NC = U_k^\top A \in \mathbb{R}^{k \times N} give each data point's position in the low-dimensional space. For k=2k=2, plotting as a scatter reveals natural groupings. The fraction of variance explained tells how much information is preserved.

A "scree plot" graphs σi2\sigma_i^2 vs ii. A sharp "elbow" suggests the appropriate kk. Common heuristic: choose kk so that FVE(k)0.90(k) \geq 0.90 (90% variance explained).

Formal View

Definition 9.8 — Fraction of Variance Explained
FVE(k)=i=1kσi2i=1rσi2\text{FVE}(k) = \frac{\sum_{i=1}^k \sigma_i^2}{\sum_{i=1}^r \sigma_i^2}.

A scree plot graphs σi2\sigma_i^2 vs ii. An "elbow" suggests the appropriate kk.

Interactive Visualization

Span Visualizer

Why This Matters

PCA scatter plots compress high-dimensional data into a human-readable 2D picture.

  • Genomics: 2D PCA of genetic data separates populations by ancestry.
  • Quality control: PCA of sensor data identifies anomalous batches.
  • Finance: PCA reveals market regimes.

Quiz

Question 1

If singular values are {10,6,2,1}\{10, 6, 2, 1\}, what fraction of variance is explained by the top-2 components?

Question 2

A 2D PCA scatter plot is useful even for data in R1000\mathbb{R}^{1000}.

Common Mistakes

  • Assuming 2D PCA always gives a useful plot — check FVE first.
  • Forgetting axes in a PCA plot are principal components, not original features.