15.47 min read

Cutting a Wire

To make the backward pass precise, we need a formal way to say "how does the output depend on this specific internal wire?" The tool is the wire-cut circuit.

Pick any wire named $x$ . The $x$ -wire-cut circuit removes that wire and treats its value as a free external input to the rest of the circuit. The output now depends on $x$ (what you feed in) as well as the original inputs $\mathbf{t}$ . We write its functional behavior as $f(x, \mathbf{t})$ .

Now $\frac{\partial f(x, \mathbf{t})}{\partial x}(x_0, \mathbf{t}_0)$ is a well-defined partial: how does the circuit output change when we wiggle just the $x$ wire, holding everything else fixed?

To seed the backward pass, note that for the output wire $f$ itself: $f(f, \mathbf{t}) = f$ (trivially), so $\frac{\partial f(f, \mathbf{t})}{\partial f} = 1$ . This is the number we inject at the start.

Formal View

Definition 15.4 — Wire-Cut Circuit

For a wire

x

in the circuit, the $x$-wire-cut circuit replaces that internal wire with a free input. Its functional behavior is

f(x, \mathbf{t})

. The partial

\frac{\partial f(x, \mathbf{t})}{\partial x}(x_0, \mathbf{t}_0)

measures the circuit output's sensitivity to the

x

-wire at the forward-pass values.

Remark 15.3 — Seeding the Backward Pass

For the output wire

f

f(f, \mathbf{t}) = f

(identity), so

\frac{\partial f(f, \mathbf{t})}{\partial f}(f_0, \mathbf{t}_0) = 1

. This seed value of 1 is placed on the output wire to start the backward pass.

Why This Matters

Wire cutting gives the backward pass a rigorous foundation — it turns gradient propagation into precise partial derivative computation.

Foundation for formal proofs of backpropagation correctness
Basis for higher-order autodiff (computing Hessians via backprop through backprop)
Forward-mode vs. reverse-mode autodiff differ in which wires they "cut" and when
Sensitivity analysis in scientific computing and engineering design

Learning Resources

Backpropagation calculus | Deep Learning chapter 4

3Blue1Brown

Shows the chain rule structure underlying backprop in a simple chain circuit.

11 min

The spelled-out intro to neural networks and backpropagation: building micrograd

Andrej Karpathy

Implements the wire-cut idea by attaching a .grad attribute to each value object.

144 min

Quiz

Question 1

What is the value of $\frac{\partial f(f, \mathbf{t})}{\partial f}(f_0, \mathbf{t}_0)$ ?

Question 2

In $f(x, \mathbf{t})$ , what is $x$ ?

Question 3

$f(x, \mathbf{t})$ and $f(\mathbf{t})$ describe the same function.

Question 4

The backward pass goal is to compute, for each $t_i$ :

Common Mistakes

Thinking wire-cutting physically breaks the circuit — it is a conceptual formalism for defining partial derivatives, not an actual modification.
Forgetting that the seed value 1 comes from $\frac{d}{df}[f] = 1$ .