The Multivariate Chain Rule
The multivariate chain rule in its full generality states: if is differentiable at and is differentiable at , then is differentiable at with Jacobian:
This is matrix multiplication: the Jacobian of (at the intermediate point) times the Jacobian of (at ) gives the Jacobian of (at ).
The beauty of this result is its simplicity: differentiating a composition is just multiplying the Jacobians. All the complexity of the intermediate computation is captured in these two matrices.
Formal View
This is sometimes called the "abstract chain rule" or "total derivative chain rule". The univariate chain rule is the special case .
Why This Matters
The multivariate chain rule is the foundational theorem of multivariate calculus and the mathematical basis of all deep learning.
- Neural network backpropagation: the gradient is computed by multiplying Jacobians from output to input
- Automatic differentiation systems implement precisely this matrix multiplication
- PDE discretizations: chain rule relates continuous derivatives to discrete approximations
Quiz
For and , the Jacobian is:
The chain rule for Jacobians corresponds to matrix multiplication.
Common Mistakes
- Reversing the order: instead of — matrix multiplication is not commutative.
- Evaluating at instead of at .
- Forgetting to check matrix size compatibility before claiming the product is well-defined.