11.108 min read

Jacobian: General Definition

The Jacobian matrix can be defined for functions between arbitrary finite-dimensional vector spaces. For $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m$ , the Jacobian $J\mathbf{f}(\mathbf{a})$ is the unique linear map from $\mathbb{R}^n$ to $\mathbb{R}^m$ that best approximates $\mathbf{f}$ near $\mathbf{a}$ .

The defining property is: $\lim_{\mathbf{h}\to\mathbf{0}} \frac{\|\mathbf{f}(\mathbf{a}+\mathbf{h}) - \mathbf{f}(\mathbf{a}) - J\mathbf{f}(\mathbf{a})\mathbf{h}\|}{\|\mathbf{h}\|} = 0$ . In coordinates, the matrix of this linear map has $(i,j)$ entry $\partial f_i / \partial x_j$ .

For scalar functions ( $m=1$ ), the Jacobian is a $1 \times n$ row vector — the gradient transposed. The gradient itself is the column vector $\nabla f = (\partial f/\partial x_1, \ldots, \partial f/\partial x_n)^T$ . Many authors write $Df$ for the row vector and $\nabla f$ for the column vector; they contain the same information.

Formal View

Definition 11.11 — Jacobian (Coordinate-Free)

The Jacobian (or total derivative) of

\mathbf{f}: D \subseteq \mathbb{R}^n \to \mathbb{R}^m

\mathbf{a}

is the unique linear map

D\mathbf{f}(\mathbf{a}): \mathbb{R}^n \to \mathbb{R}^m

satisfying

\lim_{\mathbf{h}\to\mathbf{0}} \frac{\|\mathbf{f}(\mathbf{a}+\mathbf{h}) - \mathbf{f}(\mathbf{a}) - D\mathbf{f}(\mathbf{a})\mathbf{h}\|}{\|\mathbf{h}\|} = 0

The matrix of $D\mathbf{f}(\mathbf{a})$ in the standard basis is the Jacobian matrix with $(i,j)$ entry $\partial f_i/\partial x_j$ .

Remark 11.3 — Special Cases

$m=1$:

D\mathbf{f}(\mathbf{a})

is a

1\times n

matrix (row vector), equal to

\nabla^T f(\mathbf{a})

. $n=1$:

D\mathbf{f}(\mathbf{a})

is an

m\times 1

matrix (column vector), equal to

\mathbf{f}'(a)

. $n=m=1$:

D\mathbf{f}(\mathbf{a})

is a

1\times 1

matrix, equal to the scalar

f'(a)

Why This Matters

Understanding the Jacobian as a linear map rather than just a matrix clarifies why the chain rule takes the form it does.

The chain rule for Jacobians $D(\mathbf{g}\circ\mathbf{f}) = D\mathbf{g} \cdot D\mathbf{f}$ mirrors matrix multiplication
Automatic differentiation systems (JAX, PyTorch) compute Jacobians using the coordinate-free definition
The inverse function theorem characterizes when a function is locally invertible via the Jacobian determinant

Learning Resources

Total Derivative as Linear Map

MIT OpenCourseWare

MIT lecture on the total derivative as a linear approximation.

48 min

Jacobian Matrix Overview

Steve Brunton

Intuitive explanation of the Jacobian from a data-science perspective.

15 min

Quiz

Question 1

For a scalar function $f: \mathbb{R}^n \to \mathbb{R}$ , the Jacobian $Df(\mathbf{a})$ is:

Question 2

The chain rule for Jacobians states that $D(\mathbf{g}\circ\mathbf{f})(\mathbf{a}) = D\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot D\mathbf{f}(\mathbf{a})$ , which corresponds to matrix multiplication.

Common Mistakes

Treating the Jacobian and gradient as identical — the gradient is a column vector, the Jacobian is a row vector for scalar functions.
Forgetting that the Jacobian matrix dimensions depend on both input and output dimensions.
Not recognizing that the one-variable derivative is a special case of the Jacobian (a $1\times 1$ matrix).