Dynamic And Symbolic Shape Metadata
Status: current design and implementation note for issue #829 Related: ../spec/optimizer-passes.md, ../spec/ad-contract.md, ../spec/primitive-catalog.md, ../spec/backend-contract.md
Purpose
This note defines the shape-metadata contract needed for dimensions that are not plain constants or input-axis sizes.
The immediate triggers are:
DynamicTruncate(input, size_scalar, axis), whose output extent depends on a runtime scalar tensor value.transpose_scatter, which needs inverse gatherslice_sizesderived from symbolic update-window dimensions.
Both expose the same root problem: current metadata can describe concrete sizes and symbolic arithmetic over tensor axis sizes, but it cannot say whether the result is exact, conservative, or derived from runtime tensor values.
This document is the current shape contract for #829 work.
Current Model
The current system has two related expression forms:
DimExpr: op-local expressions over the current op’s input shapes.SymDim: value-side symbolic expressions used by traced graph construction and AD metadata.
This is enough for expressions such as:
output_dim = input0.axis(0) * input1.axis(2)
It is not enough for:
output_dim = clamp_runtime_scalar(input1, 0, input0.axis(axis))
Nor is it enough to distinguish these two claims:
the output axis is exactly n
the output axis is at most n
The second distinction matters because compiler passes may use metadata to emit runtime Reshape or BroadcastInDim parameters. An upper bound is useful for some safety checks, but it is not a legal replacement for an exact dimension.
Design Goals
- Preserve the fast static-shape path for existing concrete programs.
- Make rank metadata exact even when some extents are dynamic.
- Distinguish exact extents from upper-bound or unknown extents.
- Keep backend kernels concrete: backend configs receive resolved
usizesizes, not unresolved symbolic expressions. - Let graph and compiler layers carry symbolic config values until they can be resolved at execution time.
- Avoid new AD construction panics for resolvable symbolic shape metadata.
- Avoid implementing full dynamic shape polymorphism in the first pass.
Non-Goals
- Do not replace every shape expression user in one PR.
- Do not require all backend kernels to accept dynamic shape parameters.
- Do not introduce a constraint solver.
- Do not add scatter-only or
DynamicTruncate-only hacks that bypass shared metadata invariants. - Do not change user-facing tensor operation semantics beyond replacing inaccurate metadata and panics with explicit behavior.
Recommended Approach
Use a two-layer model:
Value shape metadata
rank: exact
extents: Vec<ShapeExtent>
ShapeExtent
Exact(ExtentExpr)
UpperBound(ExtentExpr)
Unknown
ExtentExpr
Const(usize)
InputAxis { input_idx, axis }
RuntimeScalar { input_idx, semantics }
Add/Sub/Mul/FloorDiv/Min/Max(...)
The concrete implementation stores Vec<ShapeExtent<_>> directly in TensorMeta and ExecInstruction::output_extents; there is no separate public shape-metadata wrapper. The important split is semantic:
ExtentExprsays how a size would be computed.ShapeExtentsays what guarantee the expression provides.
For current static programs, every extent remains Exact(Const(...)) or Exact(InputAxis { ... }). The new states are only needed where current code already has inaccurate metadata or panic behavior.
Exact
An exact extent may be used to construct runtime shape parameters. For example, a Reshape target may be built from exact expressions because execution can resolve them from the concrete input tensors.
Exact does not mean compile-time constant. It means the expression denotes the true runtime size.
UpperBound
An upper-bound extent means:
actual_runtime_extent <= expression_value
It may be used for conservative reasoning, diagnostics, allocation guards, or skip decisions. It must not be used as if it were the true output shape.
For the first implementation pass, DynamicTruncate can use:
axis != truncated_axis: Exact(input.axis(axis))
axis == truncated_axis: UpperBound(input.axis(axis))
This immediately fixes the false exactness without requiring runtime scalar expressions to be threaded through every compiler path.
Unknown
Unknown means no useful extent expression is available. It should be rare. Code that sees Unknown must either avoid shape-sensitive rewrites or return a structured unsupported-dynamic-shape error.
Value-Side Metadata Boundary
Shape metadata belongs to values, not to operation payloads. Operation payloads should carry only structural identity and output requirements that are part of the op’s semantics. Input-shape snapshots used for AD, validation, or replay belong in value metadata.
| Payload kind | Owner |
|---|---|
| Structural parameters such as axes, permutation order, or contraction dims | Op payload |
| Required output shapes supplied by the user or frontend | Op payload as exact shape expressions |
| Input shape snapshots, inferred output shape facts, and guardable metadata | Value metadata |
ShapeGuardContext is the normative AD-facing metadata surface. Builder and emitter helpers may provide convenience accessors, but they must read from the same metadata store and record the same guards. AD rules must not recover shape facts by inspecting unrelated op payloads or assuming concrete extents from earlier graph-building phases.
Runtime Scalar Dimensions
The long-term exact representation for DynamicTruncate is a runtime scalar dimension expression:
Exact(Min(
RuntimeScalar { input_idx: 1, semantics: DynamicTruncateSize },
InputAxis { input_idx: 0, axis }
))
The semantics tag is required because converting a scalar tensor into a dimension is not a generic numeric cast. DynamicTruncate currently accepts specific scalar dtypes and applies operation-specific rounding and clamping rules. Those rules must stay explicit.
This should be a second implementation stage. The first stage only needs to stop reporting an upper bound as exact.
Symbolic Operation Configs
Backend-facing configs should remain concrete. For example, tenferro_tensor::GatherConfig can continue to carry slice_sizes: Vec<usize> because backend kernels execute on concrete tensors.
Graph-facing configs need a symbolic form where shape-derived sizes can appear. The current implementation uses StdTensorOp::GatherDynamicSliceSizes and the matching ExecOp::GatherDynamicSliceSizes:
GatherDynamicSliceSizes
offset_dims: Vec<usize>
collapsed_slice_dims: Vec<usize>
start_index_map: Vec<usize>
index_vector_dim: usize
slice_sizes: Vec<DimExpr>
Lowering from graph config to backend config resolves symbolic slice sizes against concrete runtime inputs immediately before dispatch. Backends still see the existing concrete GatherConfig.
This keeps the layering clean:
- AD and compiler code may express symbolic config sizes.
- Execution resolves them at the backend boundary.
- Backend kernels stay optimized for concrete sizes.
Scatter Transpose Policy
transpose_scatter builds inverse gather slice_sizes from the primal updates shape:
- If all required update-window extents are concrete, keep emitting the existing concrete inverse gather.
- If an extent is symbolic, emit
GatherDynamicSliceSizesand add the updates tensor as a non-differentiable shape-source input.
The generated dynamic gather is AD-closed: its forward rule applies the same dynamic gather to the operand tangent, and its transpose emits the same inverse scatter as concrete Gather while returning None for indices and shape sources.
Compiler Pass Contract
Compiler passes must state what shape guarantee they require.
| Consumer | Required guarantee | Rule |
|---|---|---|
| Rank checks | exact rank | Rank is always exact metadata. |
Transpose metadata |
any known extents | Permute extent metadata without changing guarantees. |
BroadcastInDim execution shape |
exact target extents | Reject or defer if any target extent is not exact. |
Reshape execution shape |
exact target extents | Never use upper bounds as reshape sizes. |
DotDecomposer merge reshapes |
runtime shape inputs for execution; best extent metadata | Emit reshape parameters from actual input shapes, and propagate exact/upper-bound/unknown metadata without upgrading guarantees. |
| DCE and last-use analysis | no extent guarantee | Shape metadata is irrelevant. |
| Diagnostics | best available | May print exact, upper-bound, or unknown metadata. |
The important invariant is that an optimization pass may become conservative, but it may not silently reinterpret upper-bound metadata as exact metadata.
AD Contract
AD rules must not call constant_value().unwrap_or_else(panic) for user-reachable symbolic shapes.
There are two acceptable outcomes for newly touched user-reachable symbolic shape paths:
- emit a graph using exact symbolic metadata or runtime shape-source expressions such as
DimExpr::InputDim - return an unsupported-dynamic-shape error
AD rules should choose the narrowest metadata query that matches the graph they emit. Use rank metadata for axis-count checks and runtime shape-source expressions for broadcast, reshape, or dynamic gather parameters that should follow the actual tensor shape. Require an exact shape only when constructing a concrete op payload that cannot represent runtime dimensions.
Current AD rule signatures are not uniformly Result-returning. The implementation should introduce one shared error channel before adding future AD paths that cannot be expressed as graph ops.
The error should identify:
- the primitive
- the metadata field that required an exact extent
- whether the observed extent was symbolic, upper-bound, or unknown
Alternatives Considered
Local DynamicTruncate patch
Returning another static shape from shape_infer is not acceptable. The pre-truncation input shape is only an upper bound, and another guessed static extent would have the same false-exactness bug.
Symbolic GatherConfig only
This would fix one panic, but it would not define how compiler passes should treat exact versus conservative metadata. It also leaves DynamicTruncate incorrect.
Full runtime-scalar shape expressions first
This is the clean end state, but it touches shape inference, lowering, execution, AD, and config resolution at once. The first implementation should land the exactness contract and conservative handling before threading runtime scalar expressions through all layers.
Migration Plan
- Done: add the metadata contract types or equivalent internal representation. Preserve current exact behavior for existing static programs.
- Done: mark
DynamicTruncate’s truncated axis as an upper bound instead of exact. Update consumers so they do not use that extent as a concrete shape parameter. - Done: replace
transpose_scatter’s symbolic update-window panic with a dynamic graph gather config. - Done: add graph-facing symbolic config support for gather-like
slice_sizes. - Done: update rank-only AD transpose/JVP paths, including contraction, structural, scatter, and linalg solve rules, to use rank metadata and runtime shape sources instead of exact-shape metadata.
- Deferred: add structured AD construction errors for future dynamic or unsupported shape requirements.
- Deferred: add exact runtime scalar extents for
DynamicTruncateonce the compiler and execution layers can resolve them safely.
Each step should add focused regression tests. The first tests should cover:
DynamicTruncatemetadata no longer being treated as exact pre-truncation shape.transpose_scatterwith symbolic update-window dimensions no longer panicking.- compiler passes refusing to build
ReshapeorBroadcastInDimparameters from upper-bound extents.
Open Questions
- Should graph-facing symbolic gather config eventually replace
StdTensorOp::Gatherdirectly, or shouldGatherDynamicSliceSizesremain as the narrow dynamic variant? - Which public transform APIs should surface unsupported dynamic-shape AD errors first?
The preferred bias is to keep public surface narrow until the compiler and AD contracts settle.