Linear Algebra
11.1210 min read

Quadratic Functions and the Jacobian

For a quadratic function f(x)=xTAxf(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} (where AA is symmetric), the gradient (Jacobian transposed) can be computed using the matrix calculus identity: f(x)=2Ax\nabla f(\mathbf{x}) = 2A\mathbf{x}.

More generally, for f(x)=xTAx+bTx+cf(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} + \mathbf{b}^T \mathbf{x} + c, the gradient is f(x)=2Ax+b\nabla f(\mathbf{x}) = 2A\mathbf{x} + \mathbf{b} and the Jacobian is its transpose: Df(x)=(2Ax+b)TDf(\mathbf{x}) = (2A\mathbf{x} + \mathbf{b})^T.

This result is fundamental to optimization: the gradient of the least squares objective f(x)=Axb2=xTATAx2bTAx+b2f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2 = \mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2 is f(x)=2ATAx2ATb\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b}. Setting this to zero yields the normal equations ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b}.

Formal View

Theorem 11.5 — Gradient of a Quadratic Form
For symmetric ARn×nA \in \mathbb{R}^{n\times n}:
x(xTAx)=2Ax\nabla_\mathbf{x}(\mathbf{x}^T A \mathbf{x}) = 2A\mathbf{x}
For general AA: x(xTAx)=(A+AT)x\nabla_\mathbf{x}(\mathbf{x}^T A \mathbf{x}) = (A + A^T)\mathbf{x}, which equals 2Ax2A\mathbf{x} when A=ATA = A^T.

The Hessian (matrix of second derivatives) of xTAx\mathbf{x}^T A \mathbf{x} is 2A2A.

Example 11.5 — Gradient of Least Squares
For f(x)=Axb2f(\mathbf{x}) = \|A\mathbf{x}-\mathbf{b}\|^2: expanding gives xTATAx2bTAx+b2\mathbf{x}^T A^T A \mathbf{x} - 2\mathbf{b}^T A \mathbf{x} + \|\mathbf{b}\|^2. Applying the quadratic gradient formula: f(x)=2ATAx2ATb=2AT(Axb)\nabla f(\mathbf{x}) = 2A^T A \mathbf{x} - 2A^T \mathbf{b} = 2A^T(A\mathbf{x}-\mathbf{b}).

Why This Matters

Matrix calculus for quadratic forms is essential for deriving and understanding all linear regression and least squares methods.

  • Deriving normal equations: f=0\nabla f = 0 gives ATAx^=ATbA^T A \hat{\mathbf{x}} = A^T \mathbf{b}
  • Ridge regression: adding λx2\lambda\|\mathbf{x}\|^2 gives gradient 2ATAx2ATb+2λx2A^T A \mathbf{x} - 2A^T \mathbf{b} + 2\lambda \mathbf{x}
  • All quadratic optimization problems (portfolio optimization, control theory) rely on this gradient formula

Quiz

Question 1

Let f(x)=xTAxf(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} for symmetric AA. Then f(x)\nabla f(\mathbf{x}) equals:

Question 2

The gradient of f(x)=Axb2f(\mathbf{x}) = \|A\mathbf{x} - \mathbf{b}\|^2 with respect to x\mathbf{x} is:

Common Mistakes

  • Forgetting the factor of 2 in (xTAx)=2Ax\nabla(\mathbf{x}^T A \mathbf{x}) = 2A\mathbf{x}.
  • Applying the symmetric formula 2Ax2A\mathbf{x} when AA is not symmetric — use (A+AT)x(A+A^T)\mathbf{x} for general AA.
  • In the least squares gradient, writing 2A2A instead of 2ATA2A^T A — the chain rule introduces an extra ATA^T.