Choosing a Tensor API
This page is about choosing the tensor API that matches your workflow. tenferro separates the value you pass around, when computation runs, and which backend or device executes the work.
Start Here
For most projects without autodiff, TypedTensor<T, R> or Tensor should come first. Move to EagerTensor when you want immediate execution under an EagerRuntime; make tensors tracked only when the workflow needs backward() on scalar losses. Move to TracedTensor when the workflow needs grad, vjp, or jvp on traced graphs.
Quick reference:
| If your project needs | Start with |
|---|---|
| No autodiff, scalar type known at compile time | TypedTensor<T, R> |
| No autodiff, dtype selected at runtime | Tensor |
Immediate forward execution in one runtime, optionally backward() on scalar losses |
EagerTensor + EagerRuntime |
grad, vjp, jvp, HVP via composition, graph reuse |
TracedTensor + GraphCompiler + GraphExecutor<B> |
Tensor Types
TypedTensor<T, R = DynRank> owns runtime tensor data with a compile-time scalar type and optional compile-time rank marker. Owned values use tenferro’s column-major layout. Strided views use TypedTensorView and TypedTensorViewMut.
Tensor owns the same kind of dense data, but wraps supported scalar types in a runtime dtype enum and remains dynamic-rank. Use it when dtype must be selected dynamically, when you want the broad concrete tensor operation API, or when you need to pass CPU or CUDA tensors through backend dispatch.
tenferro-tensor-core is lower-level: it owns rank/layout metadata and host-only adapters such as HostTensor<T>, not the backend-capable TypedTensor<T, R>.
EagerTensor is concrete eager execution. It wraps Tensor values in an EagerRuntime, so each operation computes a concrete result immediately. Untracked eager tensors are forward-only. Tracked eager tensors additionally record reverse-mode state for backward() on scalar losses.
TracedTensor is a graph-building handle. It is the graph and compilation API, not the default concrete tensor type.
Execution Model
| Model | Similar to | What happens on each op |
|---|---|---|
| Direct tensor execution | NumPy-style explicit backend calls | The backend runs the op immediately and returns a concrete Tensor |
| Eager execution | PyTorch eager/autograd | The op runs immediately; tracked values record enough state for backward() |
| Traced execution | JAX tracing/jit/grad | The op records graph structure; compute runs after compile/execute |
See Execution Models for the time-axis diagram, including the difference between Eager CPU, Eager GPU, and Traced mode.
Device And Backend
CPU and CUDA are backend choices. They do not decide whether your program is typed, eager, or traced.
CUDA support is provided by the feature-gated CUDA backend for concrete, eager, and traced workflows. CPU/GPU transfer is explicit:
- upload CPU tensors before CUDA backend operations,
- keep intermediate tensors on CUDA while doing CUDA work,
- download only when the host must inspect values,
- do not expect an unsupported CUDA operation to silently fall back to CPU.
Operations that require compact storage may copy a view into compact storage on the same device. They do not silently upload CPU tensors or download CUDA tensors.
The current CUDA operation and dtype table is in Devices and GPU.
Operation Entry Points
Choose the tensor API first, then choose the operation family. CUDA is not a separate operation entry point; it is a backend/device choice for supported operations.
| Need | Without autodiff | Eager path | Traced path |
|---|---|---|---|
| Everyday tensor ops | TensorOpsExt / TypedTensorOpsExt backend-explicit methods |
EagerTensor methods / associated functions |
TracedTensor methods / associated functions |
| Einsum | Internal to tenferro-einsum runtime execution |
[&a, &b].einsum(...) via EagerEinsumExt |
compiler.einsum(...) via GraphCompilerEinsumExt plus register_runtime |
| Tensordot sugar | Use matmul or dot_general directly |
a.tensordot(&b, axes) via EagerTensorEinsumExt |
a.tensordot(&b, axes) via TracedTensorEinsumExt |
| Linear algebra | tenferro_linalg::LinalgBackend methods on a backend |
EagerTensorLinalgExt methods with autodiff |
TracedTensorLinalgExt methods |
| Automatic differentiation | Not applicable | backward() on tracked scalar losses |
grad, vjp, jvp, HVP via composition |
| External operations | Extension-defined concrete hooks | Extension-defined eager hooks and optional AD rules | Extension-defined graph hooks and optional AD rules |
Use CPU or CUDA with these paths according to backend coverage. CUDA tensors must be moved explicitly with upload/download helpers, and unsupported CUDA operations do not silently fall back to CPU.
Extension Model
Automatic differentiation is externally extensible. An extension crate can add operations, eager/traced execution hooks, and AD rules without forcing the core crate to grow application-specific APIs. FFT (extension) is the example extension package: it adds Fourier transform operations and registers AD rules for supported transforms.