Linear Algebra
11.108 min read

Jacobian: General Definition

The Jacobian matrix can be defined for functions between arbitrary finite-dimensional vector spaces. For f:RnRm\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m, the Jacobian Jf(a)J\mathbf{f}(\mathbf{a}) is the unique linear map from Rn\mathbb{R}^n to Rm\mathbb{R}^m that best approximates f\mathbf{f} near a\mathbf{a}.

The defining property is: limh0f(a+h)f(a)Jf(a)hh=0\lim_{\mathbf{h}\to\mathbf{0}} \frac{\|\mathbf{f}(\mathbf{a}+\mathbf{h}) - \mathbf{f}(\mathbf{a}) - J\mathbf{f}(\mathbf{a})\mathbf{h}\|}{\|\mathbf{h}\|} = 0. In coordinates, the matrix of this linear map has (i,j)(i,j) entry fi/xj\partial f_i / \partial x_j.

For scalar functions (m=1m=1), the Jacobian is a 1×n1 \times n row vector — the gradient transposed. The gradient itself is the column vector f=(f/x1,,f/xn)T\nabla f = (\partial f/\partial x_1, \ldots, \partial f/\partial x_n)^T. Many authors write DfDf for the row vector and f\nabla f for the column vector; they contain the same information.

Formal View

Definition 11.11 — Jacobian (Coordinate-Free)
The Jacobian (or total derivative) of f:DRnRm\mathbf{f}: D \subseteq \mathbb{R}^n \to \mathbb{R}^m at a\mathbf{a} is the unique linear map Df(a):RnRmD\mathbf{f}(\mathbf{a}): \mathbb{R}^n \to \mathbb{R}^m satisfying
limh0f(a+h)f(a)Df(a)hh=0\lim_{\mathbf{h}\to\mathbf{0}} \frac{\|\mathbf{f}(\mathbf{a}+\mathbf{h}) - \mathbf{f}(\mathbf{a}) - D\mathbf{f}(\mathbf{a})\mathbf{h}\|}{\|\mathbf{h}\|} = 0

The matrix of Df(a)D\mathbf{f}(\mathbf{a}) in the standard basis is the Jacobian matrix with (i,j)(i,j) entry fi/xj\partial f_i/\partial x_j.

Remark 11.3 — Special Cases
$m=1$: Df(a)D\mathbf{f}(\mathbf{a}) is a 1×n1\times n matrix (row vector), equal to Tf(a)\nabla^T f(\mathbf{a}). $n=1$: Df(a)D\mathbf{f}(\mathbf{a}) is an m×1m\times 1 matrix (column vector), equal to f(a)\mathbf{f}'(a). $n=m=1$: Df(a)D\mathbf{f}(\mathbf{a}) is a 1×11\times 1 matrix, equal to the scalar f(a)f'(a).

Why This Matters

Understanding the Jacobian as a linear map rather than just a matrix clarifies why the chain rule takes the form it does.

  • The chain rule for Jacobians D(gf)=DgDfD(\mathbf{g}\circ\mathbf{f}) = D\mathbf{g} \cdot D\mathbf{f} mirrors matrix multiplication
  • Automatic differentiation systems (JAX, PyTorch) compute Jacobians using the coordinate-free definition
  • The inverse function theorem characterizes when a function is locally invertible via the Jacobian determinant

Quiz

Question 1

For a scalar function f:RnRf: \mathbb{R}^n \to \mathbb{R}, the Jacobian Df(a)Df(\mathbf{a}) is:

Question 2

The chain rule for Jacobians states that D(gf)(a)=Dg(f(a))Df(a)D(\mathbf{g}\circ\mathbf{f})(\mathbf{a}) = D\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot D\mathbf{f}(\mathbf{a}), which corresponds to matrix multiplication.

Common Mistakes

  • Treating the Jacobian and gradient as identical — the gradient is a column vector, the Jacobian is a row vector for scalar functions.
  • Forgetting that the Jacobian matrix dimensions depend on both input and output dimensions.
  • Not recognizing that the one-variable derivative is a special case of the Jacobian (a 1×11\times 1 matrix).