tenferro-rs

General-purpose tensor computation library in Rust. Provides dense tensor types with CPU/GPU support, a cuTENSOR/hipTensor-compatible operation protocol, high-level einsum with N-ary contraction tree optimization, and automatic differentiation.

Workspace Crate Taxonomy

The workspace uses a simple naming rule:

The intent is to make crate choice obvious from the package name and to keep implementation-only crates clearly separated from stable public surfaces.

Current phase: active implementation. The workspace now has working dense CPU functionality, partial/experimental GPU coverage, and a family-based primitive execution layer shared across einsum, tropical algebra, and linalg.

See the design documents for architecture, API design, and future phase plans.

Workspace Architecture

Facade: tenferro-tensor-compute  Typed computation facade — re-exports Tensor<T>,
                                einsum, and linalg from a single crate
Layer 5: tenferro-capi       C-API (FFI) for Julia/Python: exposes einsum + SVD
                             with stateless rrule/frule (f64 only),
                             DLPack v1.0 zero-copy tensor exchange
Layer 4: tenferro-einsum     High-level einsum on Tensor<T>, N-ary contraction
                             tree, semiring-core/fast-path dispatch, einsum AD rules
         tenferro-linalg     Public tensor linalg APIs, composite lowering, AD rules
Layer 3: tenferro-prims      Semiring/scalar/analytic execution families
         tenferro-linalg-prims Backend-facing linalg factorization/solve contracts
Layer 2: tenferro-tensor     Tensor<T> = DataBuffer + shape + strides,
                             zero-copy view ops, impl Differentiable
Shared:  tenferro-algebra    HasAlgebra trait, Semiring trait, Standard type,
                             Scalar trait, Conjugate trait
         tenferro-device     Device enum, Error/Result types
Internal: tenferro-internal-error   Internal shared error definitions re-exported
                                     by public frontend crates where needed
          tenferro-internal-frontend-core
                                     Shared dynamic tensor substrate and
                                     structured-layout helpers for the
                                     `tenferro*` surface crates
          tenferro-internal-ad-core  Homogeneous AD tensor state, tape glue,
                                     and shared operation helpers
          tenferro-internal-ad-surface
                                     Dynamic AD surface, eager AD entrypoints,
                                     and builder-style linalg wrappers
          tenferro-internal-ad-linalg
                                     Typed linalg AD bodies and result wiring
                                     used behind `tenferro`
          tenferro-internal-ad-ops   Typed scalar/einsum/reduction AD bodies
                                     and eager helper wiring used behind
                                     `tenferro`
          tenferro-internal-runtime Internal runtime default/scope management
                                     used by `tenferro::runtime`

Extern:  chainrules-core     Core AD traits: Differentiable, ReverseRule<V>,
                             ForwardRule<V> (no tensor deps)
         chainrules          Scalar AD rules for primitive real/complex operations
         tidu                AD engine: Tape<V>, TrackedValue<V>,
                             DualValue<V> (← chainrules-core)

Foundation: strided-rs       Independent workspace (used only by tenferro-prims)
                             (strided-traits -> strided-view -> strided-kernel)

End-user:   tenferro-tensor         Typed tensor data container
            tenferro-tensor-compute Typed tensor compute facade
            tenferro-dynamic-compute Dynamic tensor compute facade without AD
            tenferro                Dynamic tensor frontend and AD runtime bridge

Extension:  tenferro-ext-tropical       Tropical semiring operations (MaxPlus, MinPlus, MaxMul)
            tenferro-ext-tropical-capi   C-API for tropical einsum
            tenferro-ext-burn            Burn deep learning framework bridge
            tenferro-ext-mdarray         mdarray multidimensional array bridge
            tenferro-ext-ndarray         ndarray multidimensional array bridge

Dependency Graph

Click a node to jump to its description below.

Dependency graph

Small note: this graph omits transitively implied edges by default. If A -> B -> C, the direct A -> C edge is left out unless it carries unique information, which keeps the layered structure readable.

Crates

tenferro-tensor-compute (Facade)

Typed tensor computation facade. Re-exports the most commonly used items from tenferro-tensor, tenferro-prims, tenferro-einsum, and tenferro-linalg so downstream users need only a single dependency for Tensor<T> computation. Start here if you want typed tensors with einsum and linear algebra.

tenferro-dynamic-compute (End-user public)

Dynamic tensor compute facade without autodiff. This crate exposes the runtime-dtype Tensor surface for users who need mixed-dtype or late-bound scalar selection without pulling in tape state or gradient APIs.

tenferro-capi (Layer 5)

C-API (FFI) for Julia, Python (JAX, PyTorch), and other languages. Exposes tensor lifecycle, einsum, and SVD (including AD rules) via opaque pointers and status codes. f64 only in this POC phase.

Design principles: opaque TfeTensorF64 handles, tfe_status_t error codes, catch_unwind for panic safety, DLPack v1.0 for zero-copy tensor exchange across language boundaries (NumPy, PyTorch, JAX, DLPack.jl).

AD approach: stateless rrule/frule only — host languages manage their own AD tapes (ChainRules.jl, PyTorch autograd, JAX custom_vjp).

tenferro-einsum (Layer 4)

High-level einsum API with three levels: string notation (einsum), pre-built subscripts (einsum_with_subscripts), and pre-optimized tree (einsum_with_plan). Each has allocating, accumulating (_into), and consuming (_owned) variants.

Einsum AD rules: tracked_einsum, dual_einsum, einsum_rrule, einsum_frule, einsum_hvp.

tenferro-linalg (Layer 4)

Tensor-level linear algebra: decompositions (SVD, QR, LU, Cholesky, eigen), solvers (solve, lstsq, solve_triangular), and utilities (inv, det, slogdet, pinv, matrix_exp, norm). External backends: faer (CPU), cuSOLVER (GPU).

Decompositions: svd, qr, lu, cholesky, eigen (symmetric), eig (general). Solvers: solve, lstsq, solve_triangular. Utilities: inv, det, slogdet, pinv, matrix_exp, norm.

All operations have stateless AD rules (_rrule, _frule).

tenferro-linalg-prims (Layer 3)

Backend-facing structured linalg kernel contracts used by tenferro-linalg. This crate holds tensor-level solve/factorization/eigensolver traits and structured result types. It is intentionally smaller than the public linalg API surface and exists to keep tenferro-prims focused on semiring/scalar execution.

tenferro-tensor (Layer 2)

Tensor<T> type with DataBuffer (Rust-owned or externally-owned via DLPack), shape/strides metadata, strict zero-copy view operations (view, permute, broadcast, diagonal, select, narrow), and PyTorch-style reshape that may materialize when a zero-copy view is not possible. TensorView<'a, T> for borrowed views. Factory functions: zeros, ones, eye. Triangular extraction: tril, triu.

tenferro-prims (Layer 3)

Low-level tensor execution substrate. The public primitive contract is the split protocol family:

These family traits are the current execution surface; there is no longer a monolithic primitive trait surface.

tenferro-algebra (Shared)

Minimal algebra foundation. HasAlgebra trait maps scalar types to their algebra (e.g., f64 -> Standard), enabling automatic backend inference. Semiring trait for algebra-generic operations. Scalar trait (blanket impl for Copy + Send + Sync + Add + Mul + Zero + One + PartialEq) defines minimum element type requirements. Conjugate trait for complex conjugation (identity for real types).

tenferro-device (Shared)

Shared infrastructure: LogicalMemorySpace (MainMemory, GpuMemory), ComputeDevice (Cpu, Cuda, Rocm), workspace-wide Error/Result types.

tenferro-internal-error (Internal)

Internal shared error crate. Owns common error variants and conversion helpers used by public frontend crates, but is not itself a stable end-user surface.

tenferro-internal-runtime (Internal)

Internal runtime scope management crate. Owns RuntimeContext, scoped runtime installation helpers, and default-runtime lookup used behind tenferro::runtime.

tenferro-internal-frontend-core (Internal)

Internal shared dynamic frontend substrate. Owns DynTensor, scalar-type metadata, structured tensor helpers, and the structured einsum/layout machinery used by both tenferro-dynamic-compute and the AD-aware tenferro surface.

tenferro-internal-ad-core (Internal)

Internal AD state crate. Owns AdTensor<T>, reverse-tape attachment, snapshot plumbing, and the shared AD helper functions used across einsum, scalar, reduction, and linalg operation builders.

tenferro-internal-ad-surface (Internal)

Internal dynamic AD surface crate. Owns the dynamic Tensor enum used by the public tenferro facade, eager AD entrypoints such as grad, backward, and forward_ad, plus the builder-style linalg wrappers that dynamic AD methods call through.

tenferro-internal-ad-linalg (Internal)

Internal typed linalg AD crate. Owns the SVD/QR/LU/eigen/slogdet/solve-family builder bodies, eager linalg AD entry points, and typed linalg AD result structs that are re-exported through tenferro.

tenferro-internal-ad-ops (Internal)

Internal typed AD operation crate. Owns the scalar, reduction, and einsum AD builder bodies and local pullback helpers that are re-exported through tenferro.

External Crates

chainrules-core (Extern)

Core AD trait definitions (like Julia's ChainRulesCore.jl), independent of any tensor type. Differentiable trait defines the tangent space; concrete types (e.g., Tensor<T>) implement it in their own crates. Rule extension traits (ReverseRule<V>, ForwardRule<V>) for per-operation AD rules.

chainrules (Extern)

Scalar-focused AD rules layered on top of chainrules-core. Provides rrule and frule implementations for primitive scalar arithmetic, projection, conjugation, powers, and related real/complex helper operations so tensor crates can reuse the same scalar differentiation behavior.

Operation-specific AD rules live with their operations, not here.

tidu (Extern)

AD engine (like Zygote.jl in Julia's ecosystem). Provides homogeneous Tape<V> graphs (explicit tape, TensorFlow GradientTape style), TrackedValue<V> (reverse-mode wrapper), and DualValue<V> (forward-mode wrapper). Gradient computation uses tape.pullback() / tape.hvp(), while the tenferro frontend exposes eager backward(...) / grad(...) helpers on top of the same tape model.

Extension Crates (extension/)

tenferro-ext-tropical (Extension)

Tropical semiring tensor operations. Extends the tenferro algebra-parameterized architecture with three tropical semirings: MaxPlus (⊕=max, ⊗=+), MinPlus (⊕=min, ⊗=+), and MaxMul (⊕=max, ⊗=×).

Provides scalar wrappers (MaxPlus<T>, MinPlus<T>, MaxMul<T>), algebra markers (MaxPlusAlgebra, etc.), semiring-family implementations for each algebra, and ArgmaxTracker for recording winner indices during tropical forward passes.

tenferro-ext-tropical-capi (Extension)

C-API (FFI) for tropical semiring tensor operations. Extends tenferro-capi with tropical einsum functions (tfe_tropical_einsum_<algebra>_f64) and their AD rules (rrule/frule) for MaxPlus, MinPlus, and MaxMul semirings. Reuses TfeTensorF64 handles from tenferro-capi.

tenferro-ext-burn (Extension)

Bridge between the Burn deep learning framework and tenferro tensor network operations. Defines TensorNetworkOps backend extension trait with tn_einsum, implements forward pass for NdArray<f64> and backward pass for Autodiff<B, C>, and provides both checked (try_einsum, try_burn_to_tenferro, try_tenferro_to_burn) and convenience panic-wrapper conversion/einsum utilities.

tenferro-ext-mdarray (Extension)

Bridge between mdarray multidimensional arrays and tenferro tensors. Provides checked (try_mdarray_to_tensor, try_tensor_to_mdarray) and convenience (mdarray_to_tensor, tensor_to_mdarray) conversion functions for bidirectional data exchange between Array<T, DynRank> and Tensor<T>.

tenferro-ext-ndarray (Extension)

Bridge between ndarray arrays and tenferro tensors. Provides checked (try_ndarray_to_tensor, try_tensor_to_ndarray) and convenience (ndarray_to_tensor, tensor_to_ndarray) conversion functions for bidirectional data exchange between dense ndarray values and tenferro_tensor::Tensor<T>. The optional frontend feature adds try_ndarray_to_frontend(...) for direct conversion into tenferro::Tensor.

tenferro (End-user public)

User-facing dynamic tensor frontend. Tensor is the canonical public tensor object; rank-0 tensors act as scalar coefficients, and diagonal or multi-equivalence-class layouts are created through frontend methods such as Tensor::diag, Tensor::diag_embed, and Tensor::with_axis_classes.

The crate exposes PyTorch-like direct tensor methods on top of the core typed tenferro crates. Reverse entrypoints use set_requires_grad, grad, and backward, while forward-mode uses scoped forward_ad::dual_level(...). Explicit numeric casts use Tensor::to_scalar_type(...), while mixed-dtype ops apply implicit result-type promotion internally. Placement and transfer stay on Tensor through memory_space, preferred_compute_device, to_memory_space, to_cpu, and to_gpu, while explicit runtime choice stays under tenferro::runtime.

Workspace Crates