Linear Algebra
14.158 min read

Non-Canonical Case: Input is a Scalar

A "non-canonical" case occurs when the outer function g:RRmg: \mathbb{R} \to \mathbb{R}^m takes a scalar input — even though generally functions map RkRm\mathbb{R}^k \to \mathbb{R}^m for k>1k > 1.

Here f:RnR\mathbf{f}: \mathbb{R}^n \to \mathbb{R} is scalar-valued, and g:RRmg: \mathbb{R} \to \mathbb{R}^m acts on its scalar output. So h(x)=g(f(x))\mathbf{h}(\mathbf{x}) = g(f(\mathbf{x})): a vector-valued function of a scalar-valued function of x\mathbf{x}.

The Jacobian: Jh(a)=g(f(a))Df(a)=g(f(a))Tf(a)J\mathbf{h}(\mathbf{a}) = g'(f(\mathbf{a})) \cdot Df(\mathbf{a}) = g'(f(\mathbf{a})) \cdot \nabla^T f(\mathbf{a}). Here gg' is an m×1m\times 1 column vector (derivative of a curve) and Tf\nabla^T f is a 1×n1\times n row vector (Jacobian of a scalar function). Their product is m×nm\times n — the right size.

Formal View

Theorem 14.9 — Chain Rule: Scalar Intermediate Variable
For f:RnRf: \mathbb{R}^n \to \mathbb{R} and g:RRm\mathbf{g}: \mathbb{R} \to \mathbb{R}^m, let h(x)=g(f(x))\mathbf{h}(\mathbf{x}) = \mathbf{g}(f(\mathbf{x})). Then
Jh(a)=g(f(a))Tf(a)Rm×nJ\mathbf{h}(\mathbf{a}) = \mathbf{g}'(f(\mathbf{a})) \cdot \nabla^T f(\mathbf{a}) \in \mathbb{R}^{m\times n}
where g(s)=dg/ds\mathbf{g}'(s) = d\mathbf{g}/ds is the tangent vector to the curve g\mathbf{g}.

Why This Matters

This case arises when composing a scalar-valued function with a vector-valued activation or embedding.

  • Softmax layer: g(z)\mathbf{g}(z) applied to a linear score f(x)=wTxf(\mathbf{x}) = \mathbf{w}^T\mathbf{x} — gradient flows back via this chain rule
  • Parametric curves defined by γ(f(x))\boldsymbol{\gamma}(f(\mathbf{x})) where ff maps inputs to a curve parameter
  • Signal processing: applying a nonlinear transform to a linear combination of inputs

Quiz

Question 1

For h(x,y)=g(f(x,y))\mathbf{h}(x,y) = \mathbf{g}(f(x,y)) where f:R2Rf: \mathbb{R}^2 \to \mathbb{R} and g:RR3\mathbf{g}: \mathbb{R} \to \mathbb{R}^3, the Jacobian JhJ\mathbf{h} is a matrix of size:

Common Mistakes

  • Confusing the order: the outer function is g\mathbf{g} (vector-valued), not ff (scalar).
  • Writing the product as Tfg\nabla^T f \cdot \mathbf{g}' (wrong order) — must be g\mathbf{g}' (column) times Tf\nabla^T f (row).