9.198 min read

Visualizing Low-Dimensional Structure

The principal coordinates $C = U_k^\top A \in \mathbb{R}^{k \times N}$ give each data point's position in the low-dimensional space. For $k=2$ , plotting as a scatter reveals natural groupings. The fraction of variance explained tells how much information is preserved.

A "scree plot" graphs $\sigma_i^2$ vs $i$ . A sharp "elbow" suggests the appropriate $k$ . Common heuristic: choose $k$ so that FVE $(k) \geq 0.90$ (90% variance explained).

Formal View

Definition 9.8 — Fraction of Variance Explained

\text{FVE}(k) = \frac{\sum_{i=1}^k \sigma_i^2}{\sum_{i=1}^r \sigma_i^2}

A scree plot graphs $\sigma_i^2$ vs $i$ . An "elbow" suggests the appropriate $k$ .

Interactive Visualization

Span Visualizer

Why This Matters

PCA scatter plots compress high-dimensional data into a human-readable 2D picture.

Genomics: 2D PCA of genetic data separates populations by ancestry.
Quality control: PCA of sensor data identifies anomalous batches.
Finance: PCA reveals market regimes.

Learning Resources

PCA visualization and scree plots

StatQuest

Interpreting PCA plots and choosing the number of components.

22 min

Dimensionality reduction visualization

Steve Brunton

Visualization with low-dimensional PCA coordinates.

16 min

Quiz

Question 1

If singular values are $\{10, 6, 2, 1\}$ , what fraction of variance is explained by the top-2 components?

Question 2

A 2D PCA scatter plot is useful even for data in $\mathbb{R}^{1000}$ .

Common Mistakes

Assuming 2D PCA always gives a useful plot — check FVE first.
Forgetting axes in a PCA plot are principal components, not original features.