Automatic Differentiation

Draft 1 min read

Why Differentiation Matters

Training neural networks requires computing gradients of a loss function with respect to model parameters. Automatic differentiation (autodiff) does this efficiently and exactly.

Forward Mode vs Reverse Mode

  • Forward mode: Computes derivatives alongside the function evaluation. Efficient when there are few inputs.
  • Reverse mode: Computes derivatives by propagating backwards from the output. Efficient when there are few outputs (like a scalar loss).

Deep learning uses reverse mode (backpropagation).

Computational Graphs

Autodiff works by building a computational graph of operations:

import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x + 1
y.backward()
print(x.grad)  # dy/dx = 2x + 3 = 7.0

Chain Rule

The chain rule is the mathematical foundation:

Lw=Lyyw\frac{\partial L}{\partial w} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial w}

Reverse mode autodiff applies the chain rule systematically through the computational graph.

Backlinks

Notes that reference this page

Related Notes

Other notes in the same chapter or with shared tags