SVD Reverse-Mode Rule (svd_rrule)

Forward

A = U \Sigma V^\dagger, \quad A \in \mathbb{C}^{M \times N}, \quad K = \min(M, N)

  • U \in \mathbb{C}^{M \times K}, U^\dagger U = I_K
  • \Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_K), \sigma_i > 0, descending
  • V \in \mathbb{C}^{N \times K}, V^\dagger V = I_K

Reverse rule

Given: cotangents \bar{U}, \bar{S}, \bar{V} of a real scalar loss \ell, i.e. \bar{U}_{ij} = \partial \ell / \partial U_{ij}^*.

Compute: \bar{A} = \partial \ell / \partial A^*.

Step 1: Build the F matrix

F_{ij} = \frac{\sigma_j^2 - \sigma_i^2}{(\sigma_j^2 - \sigma_i^2)^2 + \eta} \approx \frac{1}{\sigma_j^2 - \sigma_i^2}, \quad i \neq j

F_{ii} = 0 (in the limit \eta \to 0). The regularization \eta > 0 (default 10^{-40}) prevents division by zero when singular values are degenerate.

Also define S_{\text{inv},i} = \sigma_i / (\sigma_i^2 + \eta) \approx 1/\sigma_i.

Step 2: Accumulate the inner matrix

Compute the K \times K inner matrix \Gamma = \Gamma_{\bar{U}} + \Gamma_{\bar{V}} + \Gamma_{\bar{S}} from whichever cotangents are nonzero:

From \bar{U} (dU path)

J = F \odot (U^\dagger \bar{U})

\Gamma_{\bar{U}} = (J + J^\dagger) \Sigma + \mathrm{diag}(i \cdot \mathrm{Im}(\mathrm{diag}(U^\dagger \bar{U})) \cdot S_\text{inv})

Derivation: Differentiating U^\dagger U = I gives U^\dagger dU skew-Hermitian. The off-diagonal part of U^\dagger dU is determined by F and the SVD differential equation. The diagonal of U^\dagger dU is purely imaginary (gauge freedom in the complex case), requiring the second term. For real SVD, the diagonal term vanishes since \mathrm{Im}(\mathrm{diag}(U^T \bar{U})) = 0.

From \bar{V} (dV path)

K = F \odot (V^\dagger \bar{V})

\Gamma_{\bar{V}} = \Sigma (K + K^\dagger)

Analogous to the \bar{U} path but with \Sigma on the left. No imaginary-diagonal correction is needed because the gauge freedom is already absorbed by the \bar{U} term.

From \bar{S} (dS path)

\Gamma_{\bar{S}} = \mathrm{diag}(\bar{S})

This is the simplest cotangent path: \sigma_i are independent real parameters.

Step 3: Core formula

\bar{A}_\text{core} = U \Gamma V^\dagger

Step 4: Non-square corrections

When A is not square, the thin SVD has U or V with fewer columns than rows. The core formula only accounts for perturbations within the column space. Perturbations in the orthogonal complement require additional terms.

When M > K (tall A, thin U):

\bar{A} \mathrel{+}= (\bar{U} - U U^\dagger \bar{U}) \mathrm{diag}(S_\text{inv}) V^\dagger

The projector (I_M - U U^\dagger) extracts the component of \bar{U} in the orthogonal complement of the column space of U.

When N > K (wide A, thin V):

\bar{A} \mathrel{+}= U \mathrm{diag}(S_\text{inv}) (\bar{V}^\dagger - \bar{V}^\dagger V V^\dagger)

Analogous correction for the orthogonal complement of V.

Complete formula

For general M \times N with K = \min(M, N):

\bar{A} = U \Gamma V^\dagger + \mathbf{1}_{M > K} (I_M - U U^\dagger) \bar{U} \mathrm{diag}(S_\text{inv}) V^\dagger + \mathbf{1}_{N > K} U \mathrm{diag}(S_\text{inv}) (I_N - V V^\dagger) \bar{V}^\dagger

where \mathbf{1} denotes the indicator function and \Gamma is defined in Step 2.

Verification

Reconstruction check (forward)

\|A - U \mathrm{diag}(S) V^\dagger\|_F < \varepsilon

U^\dagger U \approx I, V^\dagger V \approx I, S \geq 0 descending.

Gradient check (backward)

Finite-difference gradient check with scalar test functions (see docs/design/testing.md for details):

  • dU only: f(A) = \mathrm{Re}(\psi^\dagger H \psi), \psi = U_{:,1}
  • dV only: f(A) = \mathrm{Re}(\psi^\dagger H \psi), \psi = V_{:,1}
  • dS only: f(A) = \sum_i \sigma_i
  • joint dU+dV: f(A) = \mathrm{Re}(U_{1,1}^* V_{1,1})

where H is a random Hermitian matrix independent of A.

References

  1. J. Townsend, “Differentiating the Singular Value Decomposition,” 2016. https://j-towns.github.io/papers/svd-derivative.pdf
  2. J.-G. Liu, “Einsum backward,” 2019. https://giggleliu.github.io/2019/04/02/einsumbp.html
  3. M. B. Giles, “An extended collection of matrix derivative results for forward and reverse mode automatic differentiation,” 2008.
  4. M. Seeger et al., “Auto-Differentiating Linear Algebra,” 2018.