Linear Algebra
14.148 min read

Multiple Outputs

The general case h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f} with vector-valued g\mathbf{g} (multiple outputs) follows the same Jacobian multiplication formula. Each output component hi=gi(f(x))h_i = g_i(\mathbf{f}(\mathbf{x})) satisfies the chain rule, and stacking all outputs gives the Jacobian form.

Concretely, for f:RnRk\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^k and g:RkRm\mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m: the (i,j)(i,j) entry of JhRm×nJ\mathbf{h} \in \mathbb{R}^{m\times n} is [Jh]ij=hixj=k=1Kgiukfkxj=[Jg]ik[Jf]kj[J\mathbf{h}]_{ij} = \frac{\partial h_i}{\partial x_j} = \sum_{k=1}^K \frac{\partial g_i}{\partial u_k}\frac{\partial f_k}{\partial x_j} = [J\mathbf{g}]_{ik}[J\mathbf{f}]_{kj} summed over kk — matrix multiplication entry by entry.

This is the most general and powerful form. It subsumes all special cases: scalar input (n=1n=1, tangent vector), scalar output (m=1m=1, gradient), and everything in between.

Formal View

Corollary 14.1 — Chain Rule with Multiple Inputs and Outputs
For h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f}: Jh(a)=Jg(f(a))Jf(a)J\mathbf{h}(\mathbf{a}) = J\mathbf{g}(\mathbf{f}(\mathbf{a})) \cdot J\mathbf{f}(\mathbf{a}). Dimensions: (m×n)=(m×k)(k×n)(m\times n) = (m\times k) \cdot (k\times n). All special cases are obtained by substituting m=1m=1, n=1n=1, or both.

Why This Matters

The general chain rule applies to every layer of a neural network and every composed function in applied mathematics.

  • Multi-output regression: chain rule for vector-valued loss functions
  • Sensor fusion: combining multiple measurements through a composed model
  • Robotics forward kinematics: composing transformations for each joint

Quiz

Question 1

For h=gf\mathbf{h} = \mathbf{g}\circ\mathbf{f} with f:R5R3\mathbf{f}: \mathbb{R}^5 \to \mathbb{R}^3 and g:R3R4\mathbf{g}: \mathbb{R}^3 \to \mathbb{R}^4, what is the size of JhJ\mathbf{h}?

Common Mistakes

  • Getting confused by all the dimensions — always remember: Jacobian is (output dim) × (input dim).
  • Forgetting that the chain rule formula applies row-by-row for each output component.