Norm AD Notes
Conventions
Unless noted otherwise, Linearization and Transpose are written for the raw-output-space norm map before any DB observable projection. For complex tensors, Transpose means the adjoint under the real Frobenius inner product
\langle X, Y \rangle_{\mathbb{R}} = \operatorname{Re}\operatorname{tr}(X^\dagger Y).
Forward
This note covers the raw scalar outputs of vector p-norms, Frobenius norm, nuclear norm, and spectral norm.
Linearization
Representative raw-output-space linearizations are:
- vector p-norm:
\dot{n} = \frac{\sum_i |x_i|^{p-2}\operatorname{Re}(\bar{x}_i \dot{x}_i)} {\|x\|_p^{p-1}}
- Frobenius norm:
\dot{n} = \frac{\operatorname{Re}\operatorname{tr}(A^\dagger \dot{A})}{\|A\|_F}
- nuclear norm:
\dot{n} = \operatorname{Re}\operatorname{tr}(U^\dagger \dot{A} V)
- spectral norm:
\dot{n} = \operatorname{Re}(u_1^\dagger \dot{A} v_1)
JVP
The JVP is the same scalar linearization evaluated at the chosen tangent.
Transpose
Representative transpose rules are:
- vector p-norm:
\bar{x}_i = \bar{n} \cdot \frac{x_i |x_i|^{p-2}}{\|x\|_p^{p-1}}
- Frobenius norm:
\bar{A} = \bar{n} \cdot \frac{A}{\|A\|_F}
- nuclear norm:
\bar{A} = \bar{n} \cdot U V^\dagger
- spectral norm:
\bar{A} = \bar{n} \cdot u_1 v_1^\dagger
VJP (JAX convention)
JAX reads the same scalar-output transpose maps directly. Nonsmooth points keep their subgradient caveats.
VJP (PyTorch convention)
PyTorch uses the same raw adjoints, together with the same masking and subgradient conventions at zeros, ties, and repeated singular values.
1. Vector p-norm
\|x\|_p = \left(\sum_i |x_i|^p\right)^{1/p}, \qquad x \in \mathbb{C}^N
Forward Rule
\dot{n} = \frac{\sum_i |x_i|^{p-2}\operatorname{Re}(\bar{x}_i \dot{x}_i)} {\|x\|_p^{p-1}}.
Reverse Rule
\bar{x}_i = \bar{n} \cdot \frac{x_i |x_i|^{p-2}}{\|x\|_p^{p-1}}.
Special cases
| p | Reverse rule | Notes |
|---|---|---|
| 0 | 0 | Piecewise constant |
| 1 | \bar{n}\,\operatorname{sgn}(x) | Subgradient at x_i = 0 |
| 2 | \bar{n}\,x / \|x\|_2 | Masked at \|x\| = 0 |
| \infty | uniform average over active maximizers | Tie-sensitive |
2. Frobenius norm
\|A\|_F = \sqrt{\operatorname{tr}(A^\dagger A)}.
Forward Rule
\dot{n} = \frac{\operatorname{Re}\operatorname{tr}(A^\dagger \dot{A})}{\|A\|_F}.
Reverse Rule
\bar{A} = \bar{n} \cdot \frac{A}{\|A\|_F}.
3. Nuclear norm
\|A\|_* = \sum_i \sigma_i(A).
If A = U S V^\dagger is an SVD, then
Forward Rule
\dot{n} = \operatorname{Re}\operatorname{tr}(U^\dagger \dot{A} V).
Reverse Rule
\bar{A} = \bar{n} \cdot U V^\dagger.
This norm inherits the same smoothness caveats as the SVD.
4. Spectral norm
\|A\|_2 = \sigma_{\max}(A).
For a simple top singular value:
Forward Rule
\dot{n} = \operatorname{Re}(u_1^\dagger \dot{A} v_1).
Reverse Rule
\bar{A} = \bar{n} \cdot u_1 v_1^\dagger.
For multiplicity k > 1, the subgradient is the average over the active singular-vector dyads.
Numerical Notes
- Nonsmooth points, especially zero inputs and repeated top singular values, require subgradient conventions.
- The DB excludes upstream norm families that are classified as unsupported subgradient cases.
Verification
- compare primal norm values against direct evaluation
- compare JVP/VJP against finite differences away from nonsmooth points
- for nuclear and spectral norms, cross-check against SVD-based observables
References
- M. B. Giles, “An extended collection of matrix derivative results for forward and reverse mode automatic differentiation,” 2008.
- G. A. Watson, “Characterization of the subdifferential of some matrix norms,”
DB Families
The DB publishes the chosen norm value directly.
The DB publishes the matrix-norm observable directly.
The DB publishes the vector-norm observable directly.
The DB treats condition-number families as scalar spectral observables derived from the same singular-value sensitivities.