11.1210 min read

Quadratic Functions and the Jacobian

For a quadratic function $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}$ (where $A$ is symmetric), the gradient (Jacobian transposed) can be computed using the matrix calculus identity: $\nabla f(\mathbf{x}) = 2A\mathbf{x}$ .

More generally, for $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} + \mathbf{b}^T \mathbf{x} + c$ , the gradient is $\nabla f(\mathbf{x}) = 2A\mathbf{x} + \mathbf{b}$ and the Jacobian is its transpose: $Df(\mathbf{x}) = (2A\mathbf{x} + \mathbf{b})^T$ .

This result is fundamental to optimization: the gradient of the least squares objective $f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2 = \mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2$ is $\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b}$ . Setting this to zero yields the normal equations $A^T A \mathbf{x} = A^T \mathbf{b}$ .

Formal View

Theorem 11.5 — Gradient of a Quadratic Form

For symmetric

A \in \mathbb{R}^{n\times n}

\nabla_\mathbf{x}(\mathbf{x}^T A \mathbf{x}) = 2A\mathbf{x}

For general

A

\nabla_\mathbf{x}(\mathbf{x}^T A \mathbf{x}) = (A + A^T)\mathbf{x}

, which equals

2A\mathbf{x}

when

A = A^T

The Hessian (matrix of second derivatives) of $\mathbf{x}^T A \mathbf{x}$ is $2A$ .

Example 11.5 — Gradient of Least Squares

For

f(\mathbf{x}) = \|A\mathbf{x}-\mathbf{b}\|^2

: expanding gives

\mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2

. Applying the quadratic gradient formula:

\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b} = 2A^T(A\mathbf{x}-\mathbf{b})

Why This Matters

Matrix calculus for quadratic forms is essential for deriving and understanding all linear regression and least squares methods.

Deriving normal equations: $\nabla f = 0$ gives $A^T A \hat{\mathbf{x}} = A^T \mathbf{b}$
Ridge regression: adding $\lambda\|\mathbf{x}\|^2$ gives gradient $2A^T A \mathbf{x} - 2A^T \mathbf{b} + 2\lambda \mathbf{x}$
All quadratic optimization problems (portfolio optimization, control theory) rely on this gradient formula

Learning Resources

Matrix Calculus: Gradient of Quadratic Forms

Steve Brunton

Deriving gradients of quadratic forms in matrix notation.

18 min

Calculus with Matrices

MIT OpenCourseWare

MIT lecture on matrix calculus and its use in least squares.

50 min

Quiz

Question 1

Let $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}$ for symmetric $A$ . Then $\nabla f(\mathbf{x})$ equals:

Question 2

The gradient of $f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2$ with respect to $\mathbf{x}$ is:

Common Mistakes

Forgetting the factor of 2 in $\nabla(\mathbf{x}^T A \mathbf{x}) = 2A\mathbf{x}$ .
Applying the symmetric formula $2A\mathbf{x}$ when $A$ is not symmetric — use $(A+A^T)\mathbf{x}$ for general $A$ .
In the least squares gradient, writing $2A$ instead of $2A^T A$ — the chain rule introduces an extra $A^T$ .