14.18 min read

Motivation for the Chain Rule

Many practical functions are naturally expressed as compositions: $h = g \circ f$ means $h(\mathbf{x}) = g(\mathbf{f}(\mathbf{x}))$ . To optimize or analyze $h$ , we need to differentiate it. The chain rule gives us the derivative of a composition in terms of the derivatives of its parts.

Example: the loss function in a neural network is a composition of many layers. Differentiating the loss to train the network requires applying the chain rule repeatedly — this is exactly what backpropagation does.

The chain rule is also essential for change of variables: when solving differential equations in a different coordinate system, or converting integrals, the Jacobian of the coordinate transformation arises via the chain rule.

Formal View

Remark 14.1 — Why We Need the Chain Rule

Given

\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^k

and

\mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m

, the composition

\mathbf{h} = \mathbf{g} \circ \mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m

is defined by

\mathbf{h}(\mathbf{x}) = \mathbf{g}(\mathbf{f}(\mathbf{x}))

. The chain rule states

J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a})

— matrix multiplication of the two Jacobians.

Why This Matters

The chain rule is the central differentiation theorem, underlying backpropagation, implicit differentiation, and change-of-variables.

Backpropagation: chain rule applied layer by layer through a neural network
Implicit differentiation: differentiating equations like $F(x, y) = 0$ to find $dy/dx$
Change of coordinates in integrals: the substitution formula uses the chain rule via the Jacobian determinant

Learning Resources

Chain Rule Introduction

3Blue1Brown

Visual and intuitive introduction to the chain rule for composition of functions.

17 min

Multivariable Chain Rule

Khan Academy

Overview of the chain rule in multiple variables.

9 min

Quiz

Question 1

The chain rule for Jacobians states $J(\mathbf{g}\circ\mathbf{f})(\mathbf{a})$ equals:

Question 2

Backpropagation in neural networks is an application of the chain rule.

Common Mistakes

Evaluating $J\mathbf{g}$ at $\mathbf{a}$ instead of at $\mathbf{f}(\mathbf{a})$ .
Reversing the order of matrix multiplication in the chain rule.
Applying the chain rule to sums instead of compositions.