Linear Algebra
14.810 min read

The Multivariate Chain Rule

The multivariate chain rule in its full generality states: if f:RnRk\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^k is differentiable at a\mathbf{a} and g:RkRm\mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m is differentiable at f(a)\mathbf{f}(\mathbf{a}), then h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f} is differentiable at a\mathbf{a} with Jacobian: Jh(a)=Jg(f(a))Jf(a)J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a})

This is matrix multiplication: the m×km\times k Jacobian of g\mathbf{g} (at the intermediate point) times the k×nk\times n Jacobian of f\mathbf{f} (at a\mathbf{a}) gives the m×nm\times n Jacobian of h\mathbf{h} (at a\mathbf{a}).

The beauty of this result is its simplicity: differentiating a composition is just multiplying the Jacobians. All the complexity of the intermediate computation is captured in these two matrices.

Formal View

Theorem 14.5 (Multivariate Chain Rule)
Let f:DRnRk\mathbf{f}: D \subseteq \mathbb{R}^n \to \mathbb{R}^k be differentiable at aD\mathbf{a} \in D, and g:ERkRm\mathbf{g}: E \subseteq \mathbb{R}^k \to \mathbb{R}^m be differentiable at f(a)E\mathbf{f}(\mathbf{a}) \in E. Then h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f} is differentiable at a\mathbf{a} and
Jh(a)=Jg(f(a))Jf(a)Rm×nJ\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}) \in \mathbb{R}^{m\times n}

This is sometimes called the "abstract chain rule" or "total derivative chain rule". The univariate chain rule is the special case m=k=n=1m=k=n=1.

Why This Matters

The multivariate chain rule is the foundational theorem of multivariate calculus and the mathematical basis of all deep learning.

  • Neural network backpropagation: the gradient is computed by multiplying Jacobians from output to input
  • Automatic differentiation systems implement precisely this matrix multiplication
  • PDE discretizations: chain rule relates continuous derivatives to discrete approximations

Quiz

Question 1

For f:R4R3\mathbf{f}: \mathbb{R}^4 \to \mathbb{R}^3 and g:R3R2\mathbf{g}: \mathbb{R}^3 \to \mathbb{R}^2, the Jacobian J(gf)J(\mathbf{g}\circ\mathbf{f}) is:

Question 2

The chain rule for Jacobians J(gf)=JgJfJ(\mathbf{g}\circ\mathbf{f}) = J\mathbf{g}\cdot J\mathbf{f} corresponds to matrix multiplication.

Common Mistakes

  • Reversing the order: JfJgJ\mathbf{f} \cdot J\mathbf{g} instead of JgJfJ\mathbf{g} \cdot J\mathbf{f} — matrix multiplication is not commutative.
  • Evaluating JgJ\mathbf{g} at a\mathbf{a} instead of at f(a)\mathbf{f}(\mathbf{a}).
  • Forgetting to check matrix size compatibility before claiming the product is well-defined.