Choosing a Tensor API

This page is about choosing the tensor API that matches your workflow. tenferro separates the value you pass around, when computation runs, and which backend or device executes the work.

Start Here

Decision tree for choosing a tensor model

For most projects without autodiff, TypedTensor<T, R> or Tensor should come first. Move to EagerTensor when you want immediate execution under an EagerRuntime; make tensors tracked only when the workflow needs backward() on scalar losses. Move to TracedTensor when the workflow needs grad, vjp, or jvp on traced graphs.

Quick reference:

If your project needs Start with
No autodiff, scalar type known at compile time TypedTensor<T, R>
No autodiff, dtype selected at runtime Tensor
Immediate forward execution in one runtime, optionally backward() on scalar losses EagerTensor + EagerRuntime
grad, vjp, jvp, HVP via composition, graph reuse TracedTensor + GraphCompiler + GraphExecutor<B>

Tensor Types

TypedTensor<T, R = DynRank> owns runtime tensor data with a compile-time scalar type and optional compile-time rank marker. Owned values use tenferro’s column-major layout. Strided views use TypedTensorView and TypedTensorViewMut.

Tensor owns the same kind of dense data, but wraps supported scalar types in a runtime dtype enum and remains dynamic-rank. Use it when dtype must be selected dynamically, when you want the broad concrete tensor operation API, or when you need to pass CPU or CUDA tensors through backend dispatch.

tenferro-tensor-core is lower-level: it owns rank/layout metadata and host-only adapters such as HostTensor<T>, not the backend-capable TypedTensor<T, R>.

EagerTensor is concrete eager execution. It wraps Tensor values in an EagerRuntime, so each operation computes a concrete result immediately. Untracked eager tensors are forward-only. Tracked eager tensors additionally record reverse-mode state for backward() on scalar losses.

TracedTensor is a graph-building handle. It is the graph and compilation API, not the default concrete tensor type.

Execution Model

Model Similar to What happens on each op
Direct tensor execution NumPy-style explicit backend calls The backend runs the op immediately and returns a concrete Tensor
Eager execution PyTorch eager/autograd The op runs immediately; tracked values record enough state for backward()
Traced execution JAX tracing/jit/grad The op records graph structure; compute runs after compile/execute

See Execution Models for the time-axis diagram, including the difference between Eager CPU, Eager GPU, and Traced mode.

Device And Backend

CPU and CUDA are backend choices. They do not decide whether your program is typed, eager, or traced.

CUDA support is provided by the feature-gated CUDA backend for concrete, eager, and traced workflows. CPU/GPU transfer is explicit:

  • upload CPU tensors before CUDA backend operations,
  • keep intermediate tensors on CUDA while doing CUDA work,
  • download only when the host must inspect values,
  • do not expect an unsupported CUDA operation to silently fall back to CPU.

Operations that require compact storage may copy a view into compact storage on the same device. They do not silently upload CPU tensors or download CUDA tensors.

The current CUDA operation and dtype table is in Devices and GPU.

Operation Entry Points

Choose the tensor API first, then choose the operation family. CUDA is not a separate operation entry point; it is a backend/device choice for supported operations.

Need Without autodiff Eager path Traced path
Everyday tensor ops TensorOpsExt / TypedTensorOpsExt backend-explicit methods EagerTensor methods / associated functions TracedTensor methods / associated functions
Einsum Internal to tenferro-einsum runtime execution [&a, &b].einsum(...) via EagerEinsumExt compiler.einsum(...) via GraphCompilerEinsumExt plus register_runtime
Tensordot sugar Use matmul or dot_general directly a.tensordot(&b, axes) via EagerTensorEinsumExt a.tensordot(&b, axes) via TracedTensorEinsumExt
Linear algebra tenferro_linalg::LinalgBackend methods on a backend EagerTensorLinalgExt methods with autodiff TracedTensorLinalgExt methods
Automatic differentiation Not applicable backward() on tracked scalar losses grad, vjp, jvp, HVP via composition
External operations Extension-defined concrete hooks Extension-defined eager hooks and optional AD rules Extension-defined graph hooks and optional AD rules

Use CPU or CUDA with these paths according to backend coverage. CUDA tensors must be moved explicitly with upload/download helpers, and unsupported CUDA operations do not silently fall back to CPU.

Extension Model

Automatic differentiation is externally extensible. An extension crate can add operations, eager/traced execution hooks, and AD rules without forcing the core crate to grow application-specific APIs. FFT (extension) is the example extension package: it adds Fourier transform operations and registers AD rules for supported transforms.