14.810 min read

The Multivariate Chain Rule

The multivariate chain rule in its full generality states: if $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^k$ is differentiable at $\mathbf{a}$ and $\mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m$ is differentiable at $\mathbf{f}(\mathbf{a})$ , then $\mathbf{h} = \mathbf{g}\circ\mathbf{f}$ is differentiable at $\mathbf{a}$ with Jacobian: $J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a})$

This is matrix multiplication: the $m\times k$ Jacobian of $\mathbf{g}$ (at the intermediate point) times the $k\times n$ Jacobian of $\mathbf{f}$ (at $\mathbf{a}$ ) gives the $m\times n$ Jacobian of $\mathbf{h}$ (at $\mathbf{a}$ ).

The beauty of this result is its simplicity: differentiating a composition is just multiplying the Jacobians. All the complexity of the intermediate computation is captured in these two matrices.

Formal View

Theorem 14.5 (Multivariate Chain Rule)

Let

\mathbf{f}: D \subseteq \mathbb{R}^n \to \mathbb{R}^k

be differentiable at

\mathbf{a} \in D

, and

\mathbf{g}: E \subseteq \mathbb{R}^k \to \mathbb{R}^m

be differentiable at

\mathbf{f}(\mathbf{a}) \in E

. Then

\mathbf{h} = \mathbf{g}\circ\mathbf{f}

is differentiable at

\mathbf{a}

and

J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}) \in \mathbb{R}^{m\times n}

This is sometimes called the "abstract chain rule" or "total derivative chain rule". The univariate chain rule is the special case $m=k=n=1$ .

Why This Matters

The multivariate chain rule is the foundational theorem of multivariate calculus and the mathematical basis of all deep learning.

Neural network backpropagation: the gradient is computed by multiplying Jacobians from output to input
Automatic differentiation systems implement precisely this matrix multiplication
PDE discretizations: chain rule relates continuous derivatives to discrete approximations

Learning Resources

The Multivariable Chain Rule

3Blue1Brown

Visualizing the chain rule as Jacobian matrix multiplication.

17 min

Chain Rule — Full Generality

MIT OpenCourseWare

MIT treatment of the chain rule for general vector-valued maps.

48 min

Quiz

Question 1

For $\mathbf{f}: \mathbb{R}^4 \to \mathbb{R}^3$ and $\mathbf{g}: \mathbb{R}^3 \to \mathbb{R}^2$ , the Jacobian $J(\mathbf{g}\circ\mathbf{f})$ is:

Question 2

The chain rule for Jacobians $J(\mathbf{g}\circ\mathbf{f}) = J\mathbf{g}\cdot J\mathbf{f}$ corresponds to matrix multiplication.

Common Mistakes

Reversing the order: $J\mathbf{f} \cdot J\mathbf{g}$ instead of $J\mathbf{g} \cdot J\mathbf{f}$ — matrix multiplication is not commutative.
Evaluating $J\mathbf{g}$ at $\mathbf{a}$ instead of at $\mathbf{f}(\mathbf{a})$ .
Forgetting to check matrix size compatibility before claiming the product is well-defined.