QR AD Notes

This note covers the reduced QR rule materialized in the DB together with the transpose-dual LQ formulas.

Conventions

Unless noted otherwise, Linearization and Transpose are written for the raw-output-space QR factorization before any DB observable projection. For complex tensors, Transpose means the adjoint under the real Frobenius inner product

\langle X, Y \rangle_{\mathbb{R}} = \operatorname{Re}\operatorname{tr}(X^\dagger Y).

Forward

The raw operator is

A \mapsto (Q, R), \qquad A = Q R, \qquad Q^\dagger Q = I.

Linearization

For M \geq N,

dR = \operatorname{syminv}\!\left(\operatorname{sym}(Q^\dagger dA R^{-1})\right) R,

dQ = (dA)R^{-1} - Q(dR R^{-1}).

For M < N, the leading square block R_1 controls the constrained part and the exact trilIm / trilImInv formulas are recorded below.

JVP

The JVP is the same case-split linearization returned on the raw factors (dQ, dR).

Transpose

For M \geq N,

\bar{A} = \left[\bar{Q} + Q \cdot \operatorname{copyltu}(R \bar{R}^\dagger - \bar{Q}^\dagger Q)\right] R^{-\dagger}.

For M < N,

\bar{A} = Q \bar{R} + \pi^\*\!\left( Q \, \operatorname{trilImInvAdjSkew}(Q^\dagger \bar{Q} - \bar{R} R^\dagger) R_1^{-\dagger}\right).

VJP (JAX convention)

JAX reads the same raw transpose on the QR outputs before any downstream observable repackages the factors.

VJP (PyTorch convention)

PyTorch uses the same full-rank versus wide-reduced case split in linalg_qr_backward. The transpose-dual LQ formulas are preserved later in this note.

Scalarized Sum-Loss Rule

For the scalar loss used by downstream gradient benchmarks,

\phi_{\mathrm{qr}}(A) = \operatorname{Re}\left(\sum_{ij} Q_{ij} + \sum_{ij} R_{ij}\right),

set the raw output cotangents to

\bar{Q} = \mathbf{1}_Q, \qquad \bar{R} = \mathbf{1}_R.

Here \mathbf{1}_Q and \mathbf{1}_R are all-ones matrices with the same shapes as Q and R, respectively.

The R cotangent is intentionally not masked to the triangular support. In the full-rank case, leading strictly lower entries of \mathbf{1}_R enter R\mathbf{1}_R^\dagger only through the strictly upper part ignored by copyltu. In the wide case, those leading strictly lower entries contribute Q\mathbf{1}_R through the direct path and cancel against the leading-block correction in the \pi^\* term.

The JVP of this scalarized loss is

d\phi_{\mathrm{qr}}[dA] = \left\langle \mathbf{1}_Q, dQ \right\rangle_{\mathbb{R}} + \left\langle \mathbf{1}_R, dR \right\rangle_{\mathbb{R}},

where (dQ, dR) are obtained from the QR linearization above.

For the full-rank reduced case M \geq N, the VJP is the raw QR transpose specialized to all-ones cotangents:

W_{\mathrm{sum}} = R \mathbf{1}_R^\dagger - \mathbf{1}_Q^\dagger Q,

H_{\mathrm{sum}} = \operatorname{copyltu}(W_{\mathrm{sum}}),

\nabla_A \phi_{\mathrm{qr}} = \left(\mathbf{1}_Q + Q H_{\mathrm{sum}}\right) R^{-\dagger}.

This is a right triangular solve with R^\dagger. For wide reduced QR (M < N), use the wide reverse formula below with \bar{Q}=\mathbf{1}_Q and \bar{R}=\mathbf{1}_R:

\nabla_A \phi_{\mathrm{qr}} = Q \mathbf{1}_R + \pi^\*\!\left( Q \, \operatorname{trilImInvAdjSkew}(Q^\dagger \mathbf{1}_Q - \mathbf{1}_R R^\dagger) R_1^{-\dagger}\right).

QR Forward Definition

For

A = Q R, \qquad A \in \mathbb{C}^{M \times N}, \qquad K = \min(M, N),

the reduced QR factorization uses

Q \in \mathbb{C}^{M \times K} with Q^\dagger Q = I_K
R \in \mathbb{C}^{K \times N} upper triangular in its leading K \times K block

The differential identity is

dA = dQ \, R + Q \, dR,

with Q^\dagger dQ skew-Hermitian.

Helper Operators

`copyltu`

\operatorname{copyltu}(M)_{ij} = \begin{cases} M_{ij}, & i > j, \\ \operatorname{Re}(M_{ii}), & i = j, \\ \overline{M_{ji}}, & i < j. \end{cases}

This constructs the Hermitian matrix determined by the lower-triangular part of M. In the real case it is the lower-triangular copy plus the mirrored strictly upper part.

`trilImInvAdjSkew`

For the wide reduced-QR backward we use

\operatorname{trilImInvAdjSkew}(X) = \begin{cases} \operatorname{tril}(X - X^\top), & \text{real case}, \\ \operatorname{tril}(X - X^\dagger) \text{ with imaginary diagonal halved}, & \text{complex case}. \end{cases}

This is the adjoint helper for the M < N case.

Reverse Rule

Given cotangents \bar{Q} and \bar{R} of a real scalar loss \ell, compute \bar{A}.

QR Case 1: Full-rank (M \geq N)

Here K = N and R \in \mathbb{C}^{N \times N} is square upper triangular.

Step 1: Auxiliary matrix

W = R \bar{R}^\dagger - \bar{Q}^\dagger Q.

Step 2: Hermitian projection

H = \operatorname{copyltu}(W).

Step 3: Assemble the right-hand side

B = \bar{Q} + Q H.

Step 4: Triangular solve

\bar{A} = B R^{-\dagger}.

This is a right solve with R^\dagger. An equivalent form is

\bar{A} = \left(\bar{Q} + Q \, \operatorname{syminvadj}(\operatorname{triu}(R \bar{R}^\dagger - Q^\dagger \bar{Q}))\right) R^{-\dagger},

which is equivalent to the copyltu formulation.

Complete formula

\bar{A} = \left[\bar{Q} + Q \cdot \operatorname{copyltu}(R \bar{R}^\dagger - \bar{Q}^\dagger Q)\right] R^{-\dagger}.

QR Case 2: Wide Reduced QR (M < N)

Here K = M and the leading square block

R_1 = R_{:, 1:K} \in \mathbb{C}^{K \times K}

controls the orthogonality-constrained part of the backward pass.

Step 1: Square auxiliary matrix

X = Q^\dagger \bar{Q} - \bar{R} R^\dagger.

Step 2: Leading-block cotangent

\bar{A}_{\mathrm{lead}} = Q \, \operatorname{trilImInvAdjSkew}(X) \, R_1^{-\dagger}.

Step 3: Embed into the full width

Let \pi^\*(Y) = [Y \mid 0] pad trailing zero columns so that \pi^\*(Y) \in \mathbb{C}^{K \times N}.

Step 4: Add the direct R path

\bar{A} = \pi^\*(\bar{A}_{\mathrm{lead}}) + Q \bar{R}.

Equivalently,

\bar{A} = Q \bar{R} + \pi^\*\!\left( Q \, \operatorname{trilImInvAdjSkew}(Q^\dagger \bar{Q} - \bar{R} R^\dagger) R_1^{-\dagger}\right).

Forward Rule

Case M \geq N

Define \operatorname{sym}(X) = X + X^\dagger and

\operatorname{syminv}(X) = \operatorname{triu}(X) - \tfrac{1}{2}\operatorname{diag}(X),

the inverse of sym on upper-triangular matrices with real diagonal. Then

dR = \operatorname{syminv}\!\left(\operatorname{sym}(Q^\dagger (dA) R^{-1})\right) R,

dQ = (dA) R^{-1} - Q \left(dR R^{-1}\right).

Case M < N

Let A_1 be the leading M \times M block of A, and define

\operatorname{trilIm}(X) = \begin{cases} \operatorname{tril}(X, -1), & \text{real case}, \\ \operatorname{tril}(X) \text{ with real diagonal zeroed}, & \text{complex case}. \end{cases}

Its inverse on skew-Hermitian inputs is

\operatorname{trilImInv}(X) = \begin{cases} X - X^\top, & \text{real case}, \\ X - X^\dagger \text{ with diagonal halved}, & \text{complex case}. \end{cases}

Then

dQ = Q \, \operatorname{trilImInv}\!\left( \operatorname{trilIm}(Q^\dagger (dA)_1 R_1^{-1})\right),

dR = Q^\dagger (dA) - Q^\dagger dQ \, R.

LQ Reverse Rule

The transpose-dual LQ formulas are included for completeness.

LQ Forward Definition

A = L Q, \qquad A \in \mathbb{C}^{M \times N}.

L \in \mathbb{C}^{M \times K} is lower triangular in its leading block
Q \in \mathbb{C}^{K \times N} satisfies Q Q^\dagger = I_K

LQ Case 1: Full-rank (N \geq M)

With K = M, define

W = L^\dagger \bar{L} - \bar{Q} Q^\dagger,

H = \operatorname{copyltu}(W),

C = H Q + \bar{Q},

\bar{A} = L^{-\dagger} C.

LQ Case 2: Tall L (M > N)

Partition

L = \begin{pmatrix} U \\ D \end{pmatrix}, \qquad U \in \mathbb{C}^{K \times K}, \qquad D \in \mathbb{C}^{(M-K) \times K}.

With the matching partition

\bar{L} = \begin{pmatrix} \bar{U} \\ \bar{D} \end{pmatrix},

the backward pass is

\bar{A}_1 = \operatorname{lq\_back\_fullrank}\!\left(U, Q, \bar{U}, \bar{Q} + \bar{D}^\dagger A_2\right),

\bar{A}_2 = \bar{D} Q,

\bar{A} = \begin{pmatrix} \bar{A}_1 \\ \bar{A}_2 \end{pmatrix}.

Verification

Forward reconstruction

Check

\|A - Q R\|_F < \varepsilon, \qquad Q^\dagger Q \approx I, \qquad R \text{ upper triangular}.

Backward checks

A representative scalar functional that couples the \bar{Q} and \bar{R} paths is

f(A) = \operatorname{Re}(v^\dagger \operatorname{op} \, v + v_2^\dagger \operatorname{op}_2 \, v_2), \qquad v = Q_{:,1}, \qquad v_2 = R_{2,:},

with random Hermitian operators independent of A.

References

M. Seeger, A. Hetzel, Z. Dai, E. Meissner, N. D. Lawrence, “Auto-Differentiating Linear Algebra,” 2018.
H.-J. Liao, J.-G. Liu, L. Wang, T. Xiang, “Differentiable Programming Tensor Networks,” 2019.

DB Families

### identity

The DB publishes the differentiable reduced-QR outputs directly.