Pseudoinverse AD Notes
Conventions
Unless noted otherwise, Linearization and Transpose are written for the raw-output-space pseudoinverse map. For complex tensors, Transpose means the adjoint under the real Frobenius inner product
\langle X, Y \rangle_{\mathbb{R}} = \operatorname{Re}\operatorname{tr}(X^\dagger Y).
Forward
The raw operator is
A \mapsto A^+ = \operatorname{pinv}(A),
with locally constant rank.
Linearization
The raw-output-space linearization is
\dot{A}^+ = -A^+ \dot{A} A^+ + (I - A^+ A)\dot{A}^\dagger (A^+)^\dagger A^+ + A^+ (A^+)^\dagger \dot{A}^\dagger (I - A A^+).
JVP
The JVP is the same three-term pseudoinverse differential evaluated at \dot{A}.
Transpose
For a raw output cotangent \bar{A}^+, the transpose map is
\bar{A} = -(A^+)^\dagger \bar{A}^+ (A^+)^\dagger + (I - A A^+) (\bar{A}^+)^\dagger A^+ (A^+)^\dagger + (A^+)^\dagger A^+ (\bar{A}^+)^\dagger (I - A^+ A).
VJP (JAX convention)
JAX reads the same projector-corrected transpose map directly on the pseudoinverse output cotangent.
VJP (PyTorch convention)
PyTorch implements algebraically equivalent branch-wise formulas that reduce intermediate sizes, but the raw cotangent map is the same three-term adjoint.
Forward Definition
A^+ = \operatorname{pinv}(A), \qquad A \in \mathbb{C}^{M \times N}
where A^+ satisfies the Moore-Penrose identities. We assume the rank of A is locally constant; the pseudoinverse is not continuous at rank-changing points.
Notation
- P_{\mathrm{col}} = A A^+: projector onto the column space of A
- P_{\mathrm{row}} = A^+ A: projector onto the row space of A
Forward Rule
Given a tangent \dot{A}:
\dot{A}^+ = -A^+ \dot{A} A^+ + (I - A^+ A)\dot{A}^\dagger (A^+)^\dagger A^+ + A^+ (A^+)^\dagger \dot{A}^\dagger (I - A A^+).
Three-term interpretation
- -A^+ \dot{A} A^+ is the inverse-like core term.
- (I - P_{\mathrm{row}})\dot{A}^\dagger (A^+)^\dagger A^+ corrects the row space.
- A^+ (A^+)^\dagger \dot{A}^\dagger (I - P_{\mathrm{col}}) corrects the column space.
For full-rank square A, the projector corrections vanish and the rule reduces to the usual inverse derivative.
Reverse Rule
Given a cotangent \bar{A}^+:
\bar{A} = -(A^+)^\dagger \bar{A}^+ (A^+)^\dagger + (I - A A^+) (\bar{A}^+)^\dagger A^+ (A^+)^\dagger + (A^+)^\dagger A^+ (\bar{A}^+)^\dagger (I - A^+ A).
This is the adjoint counterpart of the same three-term structure.
The atol / rtol thresholding used to define the primal pseudoinverse is treated as fixed metadata, not as a differentiable branch.
Verification
Moore-Penrose identities
Check the standard projector equalities:
A A^+ A \approx A, \qquad A^+ A A^+ \approx A^+.
Backward checks
Compare JVP/VJP against finite differences away from rank changes.
References
- G. H. Golub and V. Pereyra, “The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate,” 1973.
- M. B. Giles, “An extended collection of matrix derivative results for forward and reverse mode automatic differentiation,” 2008.
DB Families
The DB publishes the pseudoinverse tensor directly.
The DB uses the Hermitian pseudoinverse convention for the primal operator but the same projector-based derivative structure.
This family captures the singular-input regime explicitly while keeping the same observable shape.