9.188 min read

Dual PCA

The Dual PCA gives the same reconstruction as Classical PCA but computed via the $N \times N$ Gram matrix $G = A^\top A$ instead of the $m \times m$ covariance $K = AA^\top$ . It uses the right singular vectors $V$ and the formula $\tilde{\mathbf{q}} = \mathbf{m} + A V_k V_k^\top A^+ (\mathbf{q} - \mathbf{m})$ .

Dual PCA is efficient when $N < m$ (fewer data points than dimensions), since the Gram matrix is smaller. Both formulations produce identical reconstructions.

Formal View

Definition 9.7 — Dual PCA

For centered data

A = U\Sigma V^\top

, Dual PCA computes via the

N \times N

Gram matrix

G = A^\top A = V\Lambda_G V^\top

. Principal coordinates:

C_k^\top = V_k \Lambda_{G,k}^{1/2}

. Same reconstructions as classical PCA.

Classical PCA uses $m \times m$ covariance; Dual PCA uses $N \times N$ Gram. Choose the smaller one.

Why This Matters

Dual PCA handles the common case where $N < m$ — more common in genomics, imaging, and text mining.

Genomics: $N=1000$ patients, $m=100000$ genes — Gram is $1000 \times 1000$ , much smaller.
Kernel PCA replaces the Gram with a kernel matrix for non-linear PCA.
When $N < m$ , dual PCA is strictly faster.

Learning Resources

Dual PCA and kernel PCA

StatQuest

The dual formulation and its connection to kernel methods.

20 min

PCA: two viewpoints

MIT OpenCourseWare

Strang on both formulations of PCA.

45 min

Quiz

Question 1

When is Dual PCA more efficient?

Question 2

Classical and Dual PCA always give the same reconstructions.

Common Mistakes

Thinking dual PCA gives different results — reconstructions are identical.
Forgetting to center data before either variant.