Norm AD Notes

Conventions

Unless noted otherwise, Linearization and Transpose are written for the raw-output-space norm map before any DB observable projection. For complex tensors, Transpose means the adjoint under the real Frobenius inner product

\langle X, Y \rangle_{\mathbb{R}} = \operatorname{Re}\operatorname{tr}(X^\dagger Y).

Forward

This note covers the raw scalar outputs of vector p-norms, Frobenius norm, nuclear norm, and spectral norm.

Linearization

Representative raw-output-space linearizations are:

  • vector p-norm:

\dot{n} = \frac{\sum_i |x_i|^{p-2}\operatorname{Re}(\bar{x}_i \dot{x}_i)} {\|x\|_p^{p-1}}

  • Frobenius norm:

\dot{n} = \frac{\operatorname{Re}\operatorname{tr}(A^\dagger \dot{A})}{\|A\|_F}

  • nuclear norm:

\dot{n} = \operatorname{Re}\operatorname{tr}(U^\dagger \dot{A} V)

  • spectral norm:

\dot{n} = \operatorname{Re}(u_1^\dagger \dot{A} v_1)

JVP

The JVP is the same scalar linearization evaluated at the chosen tangent.

Transpose

Representative transpose rules are:

  • vector p-norm:

\bar{x}_i = \bar{n} \cdot \frac{x_i |x_i|^{p-2}}{\|x\|_p^{p-1}}

  • Frobenius norm:

\bar{A} = \bar{n} \cdot \frac{A}{\|A\|_F}

  • nuclear norm:

\bar{A} = \bar{n} \cdot U V^\dagger

  • spectral norm:

\bar{A} = \bar{n} \cdot u_1 v_1^\dagger

VJP (JAX convention)

JAX reads the same scalar-output transpose maps directly. Nonsmooth points keep their subgradient caveats.

VJP (PyTorch convention)

PyTorch uses the same raw adjoints, together with the same masking and subgradient conventions at zeros, ties, and repeated singular values.

1. Vector p-norm

\|x\|_p = \left(\sum_i |x_i|^p\right)^{1/p}, \qquad x \in \mathbb{C}^N

Forward Rule

\dot{n} = \frac{\sum_i |x_i|^{p-2}\operatorname{Re}(\bar{x}_i \dot{x}_i)} {\|x\|_p^{p-1}}.

Reverse Rule

\bar{x}_i = \bar{n} \cdot \frac{x_i |x_i|^{p-2}}{\|x\|_p^{p-1}}.

Special cases

p Reverse rule Notes
0 0 Piecewise constant
1 \bar{n}\,\operatorname{sgn}(x) Subgradient at x_i = 0
2 \bar{n}\,x / \|x\|_2 Masked at \|x\| = 0
\infty uniform average over active maximizers Tie-sensitive

2. Frobenius norm

\|A\|_F = \sqrt{\operatorname{tr}(A^\dagger A)}.

Forward Rule

\dot{n} = \frac{\operatorname{Re}\operatorname{tr}(A^\dagger \dot{A})}{\|A\|_F}.

Reverse Rule

\bar{A} = \bar{n} \cdot \frac{A}{\|A\|_F}.

3. Nuclear norm

\|A\|_* = \sum_i \sigma_i(A).

If A = U S V^\dagger is an SVD, then

Forward Rule

\dot{n} = \operatorname{Re}\operatorname{tr}(U^\dagger \dot{A} V).

Reverse Rule

\bar{A} = \bar{n} \cdot U V^\dagger.

This norm inherits the same smoothness caveats as the SVD.

4. Spectral norm

\|A\|_2 = \sigma_{\max}(A).

For a simple top singular value:

Forward Rule

\dot{n} = \operatorname{Re}(u_1^\dagger \dot{A} v_1).

Reverse Rule

\bar{A} = \bar{n} \cdot u_1 v_1^\dagger.

For multiplicity k > 1, the subgradient is the average over the active singular-vector dyads.

Numerical Notes

  • Nonsmooth points, especially zero inputs and repeated top singular values, require subgradient conventions.
  • The DB excludes upstream norm families that are classified as unsupported subgradient cases.

Verification

  • compare primal norm values against direct evaluation
  • compare JVP/VJP against finite differences away from nonsmooth points
  • for nuclear and spectral norms, cross-check against SVD-based observables

References

  1. M. B. Giles, “An extended collection of matrix derivative results for forward and reverse mode automatic differentiation,” 2008.
  2. G. A. Watson, “Characterization of the subdifferential of some matrix norms,”

DB Families

### norm/identity

The DB publishes the chosen norm value directly.

### matrix_norm/identity

The DB publishes the matrix-norm observable directly.

### vector_norm/identity

The DB publishes the vector-norm observable directly.

### cond/identity

The DB treats condition-number families as scalar spectral observables derived from the same singular-value sensitivities.