9.1610 min read
Principal Components
The principal components of centered data are the left singular vectors of , ordered by decreasing . The first principal component is the direction of greatest variance; is perpendicular to and captures the next most variance.
The principal coordinates (scores) are the projections: . Each column is the position of one data point in the reduced-dimension space. The fraction of variance explained by the top components is .
Formal View
Definition 9.5 — Principal Components and Scores
For centered data matrix :
- Principal components: columns of (sorted by descending ).
- Principal coordinates (scores): .
Fraction of variance explained by top : .
Interactive Visualization
Orthogonal Projection
Why This Matters
Principal components reveal structure in high-dimensional data, enabling visualization and compression.
- Genome data: top 2 PCA components separate ethnic groups in GWAS.
- Finance: PCA of stock returns identifies market, sector, and idiosyncratic risk factors.
- Image recognition: eigenfaces are principal components of a face dataset.
Quiz
Question 1
Principal components are:
Question 2
The fraction of total variance explained by the top- components is .
Common Mistakes
- Confusing principal components (directions, columns of ) with principal coordinates (scores, ).
- Forgetting to center before computing principal components.