QR AD Notes
This note covers the reduced QR rule that is materialized in the DB and keeps the transpose-dual LQ formulas from tenferro-rs/docs/AD/qr.md so that no derivation detail is lost in the migration.
QR Forward Definition
For
A = Q R, \qquad A \in \mathbb{C}^{M \times N}, \qquad K = \min(M, N),
the reduced QR factorization uses
- Q \in \mathbb{C}^{M \times K} with Q^\dagger Q = I_K
- R \in \mathbb{C}^{K \times N} upper triangular in its leading K \times K block
The differential identity is
dA = dQ \, R + Q \, dR,
with Q^\dagger dQ skew-Hermitian.
Helper Operators
copyltu
\operatorname{copyltu}(M)_{ij} = \begin{cases} M_{ij}, & i > j, \\ \operatorname{Re}(M_{ii}), & i = j, \\ \overline{M_{ji}}, & i < j. \end{cases}
This constructs the Hermitian matrix determined by the lower-triangular part of M. In the real case it is the lower-triangular copy plus the mirrored strictly upper part.
trilImInvAdjSkew
For the wide reduced-QR backward we use
\operatorname{trilImInvAdjSkew}(X) = \begin{cases} \operatorname{tril}(X - X^\top), & \text{real case}, \\ \operatorname{tril}(X - X^\dagger) \text{ with imaginary diagonal halved}, & \text{complex case}. \end{cases}
This is the adjoint helper appearing in PyTorch’s linalg_qr_backward for the M < N case.
Reverse Rule
Given cotangents \bar{Q} and \bar{R} of a real scalar loss \ell, compute \bar{A}.
QR Case 1: Full-rank (M \geq N)
Here K = N and R \in \mathbb{C}^{N \times N} is square upper triangular.
Step 1: Auxiliary matrix
W = R \bar{R}^\dagger - \bar{Q}^\dagger Q.
Step 2: Hermitian projection
H = \operatorname{copyltu}(W).
Step 3: Assemble the right-hand side
B = \bar{Q} + Q H.
Step 4: Triangular solve
\bar{A} = B R^{-\dagger}.
Implementation-wise this is a right solve with R^\dagger. PyTorch expresses the same step as
\bar{A} = \left(\bar{Q} + Q \, \operatorname{syminvadj}(\operatorname{triu}(R \bar{R}^\dagger - Q^\dagger \bar{Q}))\right) R^{-\dagger},
which is equivalent to the copyltu formulation.
Complete formula
\bar{A} = \left[\bar{Q} + Q \cdot \operatorname{copyltu}(R \bar{R}^\dagger - \bar{Q}^\dagger Q)\right] R^{-\dagger}.
QR Case 2: Wide Reduced QR (M < N)
Here K = M and the leading square block
R_1 = R_{:, 1:K} \in \mathbb{C}^{K \times K}
controls the orthogonality-constrained part of the backward pass.
Step 1: Square auxiliary matrix
X = Q^\dagger \bar{Q} - \bar{R} R^\dagger.
Step 2: Leading-block cotangent
\bar{A}_{\mathrm{lead}} = Q \, \operatorname{trilImInvAdjSkew}(X) \, R_1^{-\dagger}.
Step 3: Embed into the full width
Let \pi^\*(Y) = [Y \mid 0] pad trailing zero columns so that \pi^\*(Y) \in \mathbb{C}^{K \times N}.
Step 4: Add the direct R path
\bar{A} = \pi^\*(\bar{A}_{\mathrm{lead}}) + Q \bar{R}.
PyTorch’s linalg_qr_backward implements the same case as
\bar{A} = Q \bar{R} + \pi^\*\!\left( Q \, \operatorname{trilImInvAdjSkew}(Q^\dagger \bar{Q} - \bar{R} R^\dagger) R_1^{-\dagger}\right).
Forward Rule
PyTorch’s linalg_qr_jvp uses the same case split.
Case M \geq N
Define \operatorname{sym}(X) = X + X^\dagger and
\operatorname{syminv}(X) = \operatorname{triu}(X) - \tfrac{1}{2}\operatorname{diag}(X),
the inverse of sym on upper-triangular matrices with real diagonal. Then
dR = \operatorname{syminv}\!\left(\operatorname{sym}(Q^\dagger (dA) R^{-1})\right) R,
dQ = (dA) R^{-1} - Q \left(dR R^{-1}\right).
Case M < N
Let A_1 be the leading M \times M block of A, and define
\operatorname{trilIm}(X) = \begin{cases} \operatorname{tril}(X, -1), & \text{real case}, \\ \operatorname{tril}(X) \text{ with real diagonal zeroed}, & \text{complex case}. \end{cases}
Its inverse on skew-Hermitian inputs is
\operatorname{trilImInv}(X) = \begin{cases} X - X^\top, & \text{real case}, \\ X - X^\dagger \text{ with diagonal halved}, & \text{complex case}. \end{cases}
Then
dQ = Q \, \operatorname{trilImInv}\!\left( \operatorname{trilIm}(Q^\dagger (dA)_1 R_1^{-1})\right),
dR = Q^\dagger (dA) - Q^\dagger dQ \, R.
LQ Reverse Rule
The transpose-dual LQ formulas are retained here because the original tenferro-rs note grouped QR and LQ together.
LQ Forward Definition
A = L Q, \qquad A \in \mathbb{C}^{M \times N}.
- L \in \mathbb{C}^{M \times K} is lower triangular in its leading block
- Q \in \mathbb{C}^{K \times N} satisfies Q Q^\dagger = I_K
LQ Case 1: Full-rank (N \geq M)
With K = M, define
W = L^\dagger \bar{L} - \bar{Q} Q^\dagger,
H = \operatorname{copyltu}(W),
C = H Q + \bar{Q},
\bar{A} = L^{-\dagger} C.
LQ Case 2: Tall L (M > N)
Partition
L = \begin{pmatrix} U \\ D \end{pmatrix}, \qquad U \in \mathbb{C}^{K \times K}, \qquad D \in \mathbb{C}^{(M-K) \times K}.
With the matching partition
\bar{L} = \begin{pmatrix} \bar{U} \\ \bar{D} \end{pmatrix},
the backward pass is
\bar{A}_1 = \operatorname{lq\_back\_fullrank}\!\left(U, Q, \bar{U}, \bar{Q} + \bar{D}^\dagger A_2\right),
\bar{A}_2 = \bar{D} Q,
\bar{A} = \begin{pmatrix} \bar{A}_1 \\ \bar{A}_2 \end{pmatrix}.
Verification
Forward reconstruction
Check
\|A - Q R\|_F < \varepsilon, \qquad Q^\dagger Q \approx I, \qquad R \text{ upper triangular}.
Backward checks
A representative scalar functional that couples the \bar{Q} and \bar{R} paths is
f(A) = \operatorname{Re}(v^\dagger \operatorname{op} \, v + v_2^\dagger \operatorname{op}_2 \, v_2), \qquad v = Q_{:,1}, \qquad v_2 = R_{2,:},
with random Hermitian operators independent of A.
Implementation Correspondence
tenferro-rs/docs/AD/qr.mdwrites the rule in terms ofcopyltu,trilImInvAdjSkew, and the QR/LQ duality. This note keeps those helpers.- PyTorch’s
linalg_qr_backwarduses the same two reduced-QR cases: full-rank viasyminvadj(... ) R^{-H}and wide reduced QR via thepi*-embeddedtrilImInvAdjSkewformula. - PyTorch’s
linalg_qr_jvpmirrors the same case split in forward mode.
References
- M. Seeger, A. Hetzel, Z. Dai, E. Meissner, N. D. Lawrence, “Auto-Differentiating Linear Algebra,” 2018.
- H.-J. Liao, J.-G. Liu, L. Wang, T. Xiang, “Differentiable Programming Tensor Networks,” 2019.
DB Families
The DB publishes the differentiable reduced-QR outputs directly.