Linear Algebra
9.1610 min read

Principal Components

The principal components of centered data AA are the left singular vectors u1,u2,\mathbf{u}_1, \mathbf{u}_2, \ldots of AA, ordered by decreasing σi\sigma_i. The first principal component u1\mathbf{u}_1 is the direction of greatest variance; u2\mathbf{u}_2 is perpendicular to u1\mathbf{u}_1 and captures the next most variance.

The principal coordinates (scores) are the projections: C=UkARk×NC = U_k^\top A \in \mathbb{R}^{k \times N}. Each column is the position of one data point in the reduced-dimension space. The fraction of variance explained by the top kk components is i=1kσi2/iσi2\sum_{i=1}^k \sigma_i^2 / \sum_i \sigma_i^2.

Formal View

Definition 9.5 — Principal Components and Scores
For centered data matrix A=UΣVA = U\Sigma V^\top: - Principal components: columns u1,,uk\mathbf{u}_1, \ldots, \mathbf{u}_k of UU (sorted by descending σi\sigma_i). - Principal coordinates (scores): C=UkARk×NC = U_k^\top A \in \mathbb{R}^{k \times N}.

Fraction of variance explained by top kk: i=1kσi2/iσi2\sum_{i=1}^k \sigma_i^2 / \sum_i \sigma_i^2.

Interactive Visualization

Orthogonal Projection

Why This Matters

Principal components reveal structure in high-dimensional data, enabling visualization and compression.

  • Genome data: top 2 PCA components separate ethnic groups in GWAS.
  • Finance: PCA of stock returns identifies market, sector, and idiosyncratic risk factors.
  • Image recognition: eigenfaces are principal components of a face dataset.

Quiz

Question 1

Principal components are:

Question 2

The fraction of total variance explained by the top-kk components is i=1kσi2/iσi2\sum_{i=1}^k \sigma_i^2 / \sum_i \sigma_i^2.

Common Mistakes

  • Confusing principal components (directions, columns of UU) with principal coordinates (scores, UkAU_k^\top A).
  • Forgetting to center before computing principal components.