Linear Algebra
14.77 min read

Multivariate Setup

The full multivariate chain rule handles the general case: f:RnRk\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^k and g:RkRm\mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m. The intermediate variables are u1,,uku_1, \ldots, u_k, and the underlying variables are x1,,xnx_1, \ldots, x_n.

Each output hi=gi(f1(x),,fk(x))h_i = g_i(f_1(\mathbf{x}), \ldots, f_k(\mathbf{x})) depends on all intermediate variables, which in turn depend on all underlying variables. The total effect of xjx_j on hih_i sums contributions through all intermediate variables uku_k.

This "sum of paths" interpretation is the essence of the Leibniz form: hixj=k=1Kgiukfkxj\frac{\partial h_i}{\partial x_j} = \sum_{k=1}^K \frac{\partial g_i}{\partial u_k}\frac{\partial f_k}{\partial x_j}. Each term in the sum is the contribution through one intermediate variable.

Formal View

Remark 14.2 — Multivariate Chain Rule: Paths Interpretation
The (i,j)(i,j) entry of Jh=JgJfJ\mathbf{h} = J\mathbf{g} \cdot J\mathbf{f} is:
hixj=k=1Kgiukfkxj\frac{\partial h_i}{\partial x_j} = \sum_{k=1}^K \frac{\partial g_i}{\partial u_k}\frac{\partial f_k}{\partial x_j}
Each term is a "path" from xjx_j to hih_i through intermediate variable uku_k. This is literally the matrix product formula.

Interactive Visualization

Chain Rule Circuit Diagram

Why This Matters

The "sum over paths" interpretation of the chain rule generalizes naturally to computational graphs and backpropagation.

  • Computational graphs in deep learning: each path from input to output contributes to the gradient
  • Dynamic programming: the Bellman equation sums contributions through all next states
  • Chemical kinetics: rate equations sum contributions through all reaction pathways

Quiz

Question 1

In the Leibniz chain rule hixj=kgiukfkxj\frac{\partial h_i}{\partial x_j} = \sum_k \frac{\partial g_i}{\partial u_k}\frac{\partial f_k}{\partial x_j}, how many terms are in the sum for f:R3R5\mathbf{f}: \mathbb{R}^3 \to \mathbb{R}^5 and g:R5R2\mathbf{g}: \mathbb{R}^5 \to \mathbb{R}^2?

Common Mistakes

  • Summing over the wrong index — the sum is over intermediate variables kk, not over inputs jj or outputs ii.
  • Forgetting to evaluate partial derivatives of g\mathbf{g} at u=f(x)\mathbf{u} = \mathbf{f}(\mathbf{x}), not at x\mathbf{x}.