8.1810 min read

Normal Equations Always Give a PSD Matrix

For any $m \times n$ matrix $M$ (any $m$ , any $n$ , any entries), the matrix $A = M^\top M$ is always positive semidefinite. The proof is elegant: for any vector $\mathbf{x} \in \mathbb{R}^n$ , $\mathbf{x}^\top (M^\top M) \mathbf{x} = (M\mathbf{x})^\top (M\mathbf{x}) = \|M\mathbf{x}\|^2 \geq 0.$ A squared norm is always non-negative. So $\mathbf{x}^\top A \mathbf{x} \geq 0$ for all $\mathbf{x}$ , making $A$ PSD.

When is $M^\top M$ actually positive definite (strictly)? Exactly when $M$ has full column rank — i.e., the null space of $M$ contains only the zero vector. If $M\mathbf{x} = \mathbf{0}$ only for $\mathbf{x} = \mathbf{0}$ , then $\|M\mathbf{x}\|^2 = 0 \implies \mathbf{x} = 0$ , so the PSD condition strengthens to PD.

This is why the normal equations $M^\top M \mathbf{x} = M^\top \mathbf{b}$ have a unique solution when $M$ has full column rank — the coefficient matrix $M^\top M$ is PD (hence invertible).

Formal View

Theorem 8.7 — Gram Matrix is Always PSD

For any

m \times n

matrix

M

, the Gram matrix

A = M^\top M

is symmetric and positive semidefinite (PSD). Furthermore,

M^\top M

is positive definite (PD) if and only if

M

has full column rank (i.e.,

\ker(M) = \{\mathbf{0}\}

Proof of PSD: $\mathbf{x}^\top M^\top M \mathbf{x} = \|M\mathbf{x}\|^2 \geq 0$ . MATLAB: `A = M' * M` always gives PSD matrix.

Interactive Visualization

Matrix-Vector Multiplication

Why This Matters

The fact that $M^\top M$ is always PSD is why least squares always works — the normal equations always have a solution, and have a unique solution when $M$ has full column rank.

Linear regression: $A^\top A$ is PSD, PD when columns are linearly independent — guaranteeing unique least squares solution.
Kernel methods: the Gram matrix of any kernel function is PSD — this is the mathematical foundation for support vector machines.
Covariance estimation: sample covariance $\frac{1}{n}X^\top X$ is always PSD.

Learning Resources

Least squares and normal equations

MIT OpenCourseWare

Gilbert Strang explains why $A^\top A$ is PSD and when it is PD.

48 min

Positive semidefinite matrices

Steve Brunton

Examples of PSD matrices arising from data matrices.

18 min

Quiz

Question 1

For any matrix $M$ , the matrix $M^\top M$ is positive semidefinite.

Question 2

When is $M^\top M$ positive definite (rather than just PSD)?

Common Mistakes

Thinking $M^\top M$ is PD in general — it is only PD when $M$ has full column rank. Tall matrices with dependent columns give PSD (not PD) $M^\top M$ .
Forgetting that $M^\top M$ being PSD does not mean it is invertible — you need full column rank for invertibility.