Norm AD Rules (norm)
1. Vector p-norm
\|x\|_p = \Bigl(\sum_i |x_i|^p\Bigr)^{1/p}, \quad x \in \mathbb{C}^N
Forward rule (JVP)
\dot{n} = \frac{\sum_i |x_i|^{p-2}\,\mathrm{Re}(\bar{x}_i\,\dot{x}_i)}{\|x\|_p^{p-1}}
Reverse rule (VJP)
\bar{x}_i = \bar{n} \cdot \frac{x_i\,|x_i|^{p-2}}{\|x\|_p^{p-1}}
Special cases
| p | \bar{x} (VJP) | Notes |
|---|---|---|
| 0 | 0 | \ell^0 “norm” is piecewise constant |
| 1 | \bar{n}\,\mathrm{sgn}(x) | Subgradient at x_i = 0 |
| 2 | \bar{n}\,x / \|x\|_2 | Masked at \|x\| = 0 |
| \infty | \bar{n}\,\mathrm{sgn}(x) \cdot \mathbb{1}_{|x| = \|x\|_\infty} / k | k = multiplicity of max |
2. Frobenius norm
\|A\|_F = \sqrt{\mathrm{tr}(A^{\mathsf{H}}A)}
Equivalent to the vector 2-norm of the flattened matrix.
Forward rule (JVP)
\dot{n} = \frac{\mathrm{Re}\!\mathrm{tr}(A^{\mathsf{H}}\dot{A})}{\|A\|_F}
Reverse rule (VJP)
\bar{A} = \bar{n} \cdot \frac{A}{\|A\|_F}
3. Nuclear norm (trace norm)
\|A\|_* = \sum_i \sigma_i(A) = \mathrm{tr}(S)
where A = U S V^{\mathsf{H}} is the SVD.
Forward rule (JVP)
\dot{n} = \mathrm{Re}\!\mathrm{tr}(U^{\mathsf{H}}\,\dot{A}\,V)
Reverse rule (VJP)
\bar{A} = \bar{n} \cdot U V^{\mathsf{H}}
Derivation. Since \|A\|_* = \sum_i \sigma_i and \dot{\sigma}_i = \mathrm{Re}(u_i^{\mathsf{H}}\,\dot{A}\,v_i), summing gives \dot{n} = \mathrm{Re}\!\mathrm{tr}(U^{\mathsf{H}}\dot{A}\,V). The adjoint is \bar{A} = \bar{n}\,U V^{\mathsf{H}}.
Non-smooth case (A rank-deficient): the subdifferential is \{UV^{\mathsf{H}} + W : P_U^{\perp} W P_V^{\perp} = W,\, \|W\|_2 \leq 1\} (Watson 1992).
4. Spectral norm (operator 2-norm)
\|A\|_2 = \sigma_{\max}(A)
Forward rule (JVP)
For simple \sigma_{\max} (multiplicity 1):
\dot{n} = \mathrm{Re}(u_1^{\mathsf{H}}\,\dot{A}\,v_1)
where u_1, v_1 are the leading singular vectors.
Reverse rule (VJP)
\bar{A} = \bar{n} \cdot u_1\,v_1^{\mathsf{H}}
For multiplicity k:
\bar{A} = \bar{n} \cdot \frac{1}{k}\sum_{i:\,\sigma_i = \sigma_{\max}} u_i\,v_i^{\mathsf{H}}
Non-differentiable when \sigma_{\max} has multiplicity > 1.
Implementation notes
- Frobenius: PyTorch decomposes to
linalg_vector_norm(A, 2, dims). - Nuclear: PyTorch decomposes to
svdvals(A).sum()— no dedicated backward. - Spectral: PyTorch decomposes to
amax(svdvals(A))— no dedicated backward. - Matrix L1 / Inf: implemented directly as max absolute column/row sums. For ties (multiple active maximizers),
frule/rruleuse uniform averaging over the active set. - Nuclear and spectral norms inherit AD rules from SVD backward.
References
- Giles, M. B. (2008). “An extended collection of matrix derivative results for forward and reverse mode AD.”
- Watson, G. A. (1992). “Characterization of the subdifferential of some matrix norms.” Linear Algebra Appl., 170, 33-45.
- Petersen, K. B. and Pedersen, M. S. (2012). The Matrix Cookbook. Section 10.6.
- PyTorch
FunctionsManual.cpp:norm_backward(L250),norm_jvp(L304),linalg_vector_norm_backward(L459). - PyTorch
LinearAlgebra.cpp:linalg_matrix_normdecomposition (Frobenius→vector_norm, nuclear→svdvals.sum, spectral→amax(svdvals)).