Pseudoinverse AD Rules (`pinv`)

Forward

A^+ = \mathrm{pinv}(A), \quad A \in \mathbb{C}^{M \times N}

where A^+ is the Moore-Penrose pseudoinverse satisfying A A^+ A = A, A^+ A A^+ = A^+, (A A^+)^{\mathsf{H}} = A A^+, (A^+ A)^{\mathsf{H}} = A^+ A.

Assumption: A(t) has constant rank in a neighborhood of the evaluation point. The pseudoinverse is not continuous at rank-changing points.

Notation

P_{\mathrm{col}} = A A^+: orthogonal projector onto \mathrm{col}(A)
P_{\mathrm{row}} = A^+ A: orthogonal projector onto \mathrm{row}(A)

Forward rule (JVP)

Given tangent \dot{A} (Golub & Pereyra, 1973):

\dot{A}^+ = -A^+\,\dot{A}\,A^+ + (I - A^+ A)\,\dot{A}^{\mathsf{H}}\,(A^+)^{\mathsf{H}} A^+ + A^+\,(A^+)^{\mathsf{H}}\,\dot{A}^{\mathsf{H}}\,(I - A A^+)

Three-term interpretation:

-A^+\dot{A}\,A^+: analogous to d(A^{-1}) = -A^{-1}\,dA\,A^{-1}
(I - P_{\mathrm{row}})\,\dot{A}^{\mathsf{H}}\,(A^+)^{\mathsf{H}} A^+: correction from the null-space of A (row-space projection perturbation)
A^+(A^+)^{\mathsf{H}}\,\dot{A}^{\mathsf{H}}(I - P_{\mathrm{col}}): correction from the left null-space of A (column-space projection perturbation)

For full-rank square A: P_{\mathrm{row}} = P_{\mathrm{col}} = I, Terms 2 and 3 vanish, recovering the standard inverse derivative.

Derivation sketch

Differentiate A^+ A A^+ = A^+ to get an expression involving \dot{A}^+ on both sides.
Differentiate (A A^+ A)^{\mathsf{H}} = A^{\mathsf{H}} (MP condition 2) to isolate A^+ A\,\dot{A}^+ and \dot{A}^+ A A^+.
Substitute back to eliminate \dot{A}^+ from the RHS factors.

Reverse rule (VJP)

Given cotangent \bar{A}^+ (same shape as A^+, i.e., N \times M):

\bar{A} = -(A^+)^{\mathsf{H}}\,\bar{A}^+\,(A^+)^{\mathsf{H}} + (I - A A^+)\,(\bar{A}^+)^{\mathsf{H}}\,A^+\,(A^+)^{\mathsf{H}} + (A^+)^{\mathsf{H}}\,A^+\,(\bar{A}^+)^{\mathsf{H}}\,(I - A^+ A)

This is structurally identical to the JVP formula with \dot{A} replaced by \bar{A}^+ and conjugate-transposition applied to the first term.

Implementation notes

Branch on M \leq N vs M > N to minimize intermediate matrix sizes (PyTorch optimization).
Requires both A and A^+ to be saved from the forward pass.
The SVD-based alternative (A = U \Sigma V^{\mathsf{H}}, A^+ = V \Sigma^{-1} U^{\mathsf{H}}, differentiate through SVD) is less efficient than the direct Golub-Pereyra formula.
The atol/rtol singular-value thresholding is not differentiated through.

References

Golub, G. H. and Pereyra, V. (1973). “The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate.” SIAM J. Numer. Anal., 10(2), 413-432.
Giles, M. B. (2008). “An extended collection of matrix derivative results for forward and reverse mode AD.”
PyTorch FunctionsManual.cpp: pinv_jvp (L2091), pinv_backward (L2110).
JAX jax/_src/numpy/linalg.py: _pinv_jvp (PR #2794).

Pseudoinverse AD Rules (pinv)