16.98 min read

Hessians and Convexity

For differentiable functions, there is a beautiful Hessian-based test for convexity — a global version of the second derivative test.

The theorem: a $D^2$ function $f$ over a convex domain is convex if and only if its Hessian $Hf(\mathbf{x})$ is PSD at every point. If the Hessian is everywhere PD, then $f$ is strictly convex (but not vice versa — the converse fails).

Compare this to the second derivative test: there, having a PSD Hessian at one specific critical point tells us very little. Here, knowing the Hessian is PSD everywhere is much stronger — it gives global convexity.

Combined with the convexity-implies-global-minimum theorem: if you find a critical point of a function whose Hessian is everywhere PSD, you have found the global minimum. Gradient descent wins the game.

Formal View

Theorem 16.6 — Hessian Criterion for Convexity

Let

f

be twice differentiable over a convex domain

C

. Then:

f \text{ is convex} \iff Hf(\mathbf{x}) \text{ is PSD for all } \mathbf{x} \in \text{int}(C)

Hf(\mathbf{x})

is PD everywhere, then

f

is strictly convex (but not conversely).

Remark 16.1 — Why PSD Everywhere Differs from the Second Derivative Test

In the second derivative test, a PSD (but not PD) Hessian at one critical point is inconclusive — error terms can tip either way. But PSD everywhere in a convex domain locks in convexity globally. The "flat direction" in any local quadratic cannot create a local minimum that is not global, because convexity of the domain rules it out.

Why This Matters

This criterion is how you verify that a problem is convex — and once verified, you know gradient descent will find the global optimum.

Verifying ML loss functions are convex: check Hessian is globally PSD
Least squares: Hessian is $A^t A$ (PSD everywhere) → unique global min
Neural networks: non-convex loss landscape — Hessian is indefinite at many points
Convex relaxations: approximate a non-convex problem with one having a PSD Hessian

Learning Resources

Convex sets and functions

Stanford — Convex Optimization (Boyd)

Proves the Hessian criterion for convexity and discusses its applications.

20 min

Convex optimization overview

Visually Explained

Explains why PSD Hessians everywhere guarantee global optima.

16 min

Quiz

Question 1

A $D^2$ function $f$ on a convex domain is convex if and only if:

Question 2

If $Hf$ is PD everywhere, then $f$ is strictly convex.

Question 3

For $f(x) = x^4$ , is $f$ convex on $\mathbb{R}$ ?

Question 4

You find a critical point of $f$ and verify $Hf$ is everywhere PSD. The critical point is:

Question 5

The least squares objective $f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2$ is convex.

Common Mistakes

Confusing "PSD at a critical point (inconclusive 2nd derivative test)" with "PSD everywhere (convex function)" — these are very different claims.
Thinking PD everywhere is required for convexity — PSD everywhere suffices.
Assuming a convex function must have a global minimum — it might not (e.g., $f(x) = e^x$ on $\mathbb{R}$ has no minimum).