General-purpose tensor computation library in Rust. Provides dense tensor types with CPU/GPU support, a cuTENSOR/hipTensor-compatible operation protocol, high-level einsum with N-ary contraction tree optimization, and automatic differentiation.
Current phase: API skeleton (POC). Public signatures
and documentation are in place; most function bodies use
todo!(). The purpose of this phase is to validate the API
design before writing implementations.
See the design document for architecture, API design, and future phase plans.
Layer 5: tenferro-capi C-API (FFI) for Julia/Python: exposes einsum + SVD
with stateless rrule/frule (f64 only),
DLPack v1.0 zero-copy tensor exchange
Layer 4: tenferro-einsum High-level einsum on Tensor<T>, N-ary contraction
tree, algebra dispatch, einsum AD rules
tenferro-linalg Tensor-level SVD/QR/LU/eigen, linalg AD rules
Layer 3: tenferro-tensor Tensor<T> = DataBuffer + shape + strides,
zero-copy view ops, impl Differentiable
Layer 2: tenferro-prims "Tensor BLAS": TensorPrims<A> trait
(algebra-parameterized), plan-based execution
Shared: tenferro-algebra HasAlgebra trait, Semiring trait, Standard type,
Scalar trait, Conjugate trait
tenferro-device Device enum, Error/Result types
Extern: chainrules-core Core AD traits: Differentiable, ReverseRule<V>,
ForwardRule<V> (no tensor deps)
chainrules AD engine: Tape<V>, TrackedTensor<V>,
DualTensor<V> (← chainrules-core)
Foundation: strided-rs Independent workspace (used only by tenferro-prims)
(strided-traits -> strided-view -> strided-kernel)
Extension: tenferro-tropical Tropical semiring operations (MaxPlus, MinPlus, MaxMul)
tenferro-tropical-capi C-API for tropical einsum
tenferro-burn Burn deep learning framework bridge
tenferro-mdarray mdarray multidimensional array bridge
Click a node to jump to its description below.
C-API (FFI) for Julia, Python (JAX, PyTorch), and other languages. Exposes tensor lifecycle, einsum, and SVD (including AD rules) via opaque pointers and status codes. f64 only in this POC phase.
Design principles: opaque TfeTensorF64 handles,
tfe_status_t error codes, catch_unwind for
panic safety, DLPack v1.0 for zero-copy tensor exchange across language
boundaries (NumPy, PyTorch, JAX, DLPack.jl).
AD approach: stateless rrule/frule only —
host languages manage their own AD tapes (ChainRules.jl, PyTorch
autograd, JAX custom_vjp).
High-level einsum API with three levels: string notation
(einsum), pre-built subscripts
(einsum_with_subscripts), and pre-optimized tree
(einsum_with_plan). Each has allocating, accumulating
(_into), and consuming (_owned) variants.
Einsum AD rules: tracked_einsum,
dual_einsum, einsum_rrule,
einsum_frule, einsum_hvp.
Tensor-level linear algebra decompositions: SVD, QR, LU, eigendecomposition. Users specify left/right dimension indices; the crate handles matricize -> decompose -> unmatricize internally via external backends (faer for CPU, cuSOLVER for GPU).
Primary functions: svd, qr,
lu, eigen. Result types:
SvdResult, QrResult, LuResult,
EigenResult. SVD truncation: SvdOptions
(max_rank, cutoff).
Linalg AD rules: svd_rrule, svd_frule,
qr_rrule, qr_frule, lu_rrule,
lu_frule, eigen_rrule,
eigen_frule.
Tensor<T> type with DataBuffer
(Rust-owned or externally-owned via DLPack), shape/strides metadata, and
zero-copy view operations (permute, broadcast,
diagonal, reshape, select,
narrow). TensorView<'a, T> for borrowed
views. Factory functions: zeros, ones,
eye. Triangular extraction: tril,
triu.
Low-level "Tensor BLAS" protocol. TensorPrims<A>
trait parameterized by algebra A with a cuTENSOR-compatible
plan-based execution model
(PrimDescriptor -> plan -> execute).
Core ops (universal set): batched_gemm,
reduce, trace, permute,
anti_trace, anti_diag,
elementwise_unary. Extended ops (dynamically queried):
contract, elementwise_mul.
Minimal algebra foundation. HasAlgebra trait maps scalar
types to their algebra (e.g., f64 -> Standard), enabling
automatic backend inference. Semiring trait for
algebra-generic operations. Scalar trait (blanket impl for
Copy + Send + Sync + Add + Mul + Zero + One + PartialEq)
defines minimum element type requirements. Conjugate trait
for complex conjugation (identity for real types).
Shared infrastructure: LogicalMemorySpace (MainMemory,
GpuMemory), ComputeDevice (Cpu, Cuda, Hip), workspace-wide
Error/Result types.
Core AD trait definitions (like Julia's ChainRulesCore.jl),
independent of any tensor type. Differentiable trait
defines the tangent space; concrete types (e.g.,
Tensor<T>) implement it in their own crates. Rule
extension traits (ReverseRule<V>,
ForwardRule<V>) for per-operation AD rules.
AD engine (like Zygote.jl in Julia's ecosystem). Provides
Tape<V> (explicit tape, TensorFlow GradientTape
style), TrackedTensor<V> (reverse-mode wrapper),
DualTensor<V> (forward-mode wrapper). Gradient
computation via tape.pullback(), HVP via
tape.hvp(). Depends only on chainrules-core.
Re-exports all of chainrules-core so downstream crates can
depend on just chainrules for both traits and engine.
Operation-specific AD rules live with their operations, not here.
Tropical semiring tensor operations. Extends the tenferro algebra-parameterized architecture with three tropical semirings: MaxPlus (⊕=max, ⊗=+), MinPlus (⊕=min, ⊗=+), and MaxMul (⊕=max, ⊗=×).
Provides scalar wrappers (MaxPlus<T>,
MinPlus<T>, MaxMul<T>), algebra
markers (MaxPlusAlgebra, etc.), TensorPrims
implementations for each algebra, and ArgmaxTracker for
recording winner indices during tropical forward passes.
C-API (FFI) for tropical semiring tensor operations. Extends
tenferro-capi with tropical einsum functions
(tfe_tropical_einsum_<algebra>_f64) and their AD
rules (rrule/frule) for MaxPlus, MinPlus, and MaxMul semirings. Reuses
TfeTensorF64 handles from tenferro-capi.
Bridge between the Burn deep learning
framework and tenferro tensor network operations. Defines
TensorNetworkOps backend extension trait with
tn_einsum, implements forward pass for
NdArray<f64> and backward pass for
Autodiff<B, C>, and provides Burn tensor / tenferro
tensor conversion utilities.
Bridge between mdarray
multidimensional arrays and tenferro tensors. Provides
mdarray_to_tensor and tensor_to_mdarray
conversion functions for bidirectional data exchange between
Array<T, DynRank> and
Tensor<T>.