Eager Operations

This guide covers immediate execution: direct tensor computation without autodiff and EagerTensor forward execution with optional PyTorch-like reverse-mode autodiff on scalar losses. Start with TypedTensor<T, R> or Tensor for work without autodiff. Use EagerTensor when you want operations to run immediately inside an EagerRuntime, and create tracked variables when the workflow needs gradient accumulation and backward().

Setup

For a published build, depend on the crates you use:

[dependencies]
tenferro-runtime = "..."
tenferro-cpu = "..."
tenferro-tensor = "..."
tenferro-ad = "..."
tenferro-linalg = "..."
tenferro-einsum = { version = "...", features = ["autodiff"] }

When working from a local checkout, replace the versions with path = "..." entries that match your project layout. For a scratch crate created directly inside the tenferro-rs checkout, include an empty [workspace] table so Cargo does not try to enroll the scratch crate in the parent workspace:

[workspace]

[dependencies]
tenferro-runtime = { path = "../crates/tenferro-runtime" }
tenferro-cpu = { path = "../crates/tenferro-cpu" }
tenferro-tensor = { path = "../crates/tenferro-tensor" }
tenferro-ad = { path = "../crates/tenferro-ad" }
tenferro-linalg = { path = "../crates/tenferro-linalg" }
tenferro-einsum = { path = "../crates/tenferro-einsum", features = ["autodiff"] }

The first local build can spend several minutes compiling the default cpu-faer stack. That is expected on a fresh machine.

Most direct tensor examples start by importing the CPU backend and concrete tensor types:

use tenferro_cpu::CpuBackend;
use tenferro_runtime::{Tensor, TypedTensor};

let mut backend = CpuBackend::new();

Every direct tensor operation requires a backend context. CpuBackend is the standard CPU backend using the faer linear algebra library. With the cuda feature, the same concrete and eager APIs can execute supported operations on the CUDA backend when tensors are explicitly placed on the GPU.

EagerRuntime owns the eager backend and the optional gradient slots for tracked eager tensors. Untracked eager tensors are forward-only. If you share one context across multiple tracked tensors, their gradients accumulate into the same state and you can reset them together with clear_grads().

Most broad non-AD concrete operations are available as TensorOpsExt / TypedTensorOpsExt methods with an explicit backend. AD workflows use the EagerTensor method surface instead. TypedTensor<T, R> is the first layer to consider when you want compile-time dtype safety, optional rank typing, or typed data that may live on the host or in backend-owned storage. Einsum is provided by the separate tenferro-einsum standard extension.

Tracked EagerTensor values support the differentiable method surface most loss functions need:

Category EagerTensor methods
Elementwise add, mul, neg, exp
Reduction reduce_sum, reduce_prod, reduce_max, reduce_min
Matrix products matmul, dot_general, dot_general_with_conj
Shape/layout reshape, transpose, broadcast_in_dim, slice, pad, reverse, concatenate
Indexing/diagonal gather, scatter, dynamic_slice, extract_diag, embed_diag, tril, triu
DType checked convert, explicit lossy cast

Operation-family crates add eager extension traits. For example, tenferro_linalg::EagerTensorLinalgExt owns linalg eager methods and tenferro_einsum::EagerEinsumExt owns eager einsum on input slices/arrays.

For CUDA, eager means the operation is submitted immediately. It does not mean the host waits after every GPU kernel. Host synchronization happens at download/read boundaries, through EagerRuntime::synchronize(), or inside operations that must inspect device-side status. See Execution Models and Devices and GPU.

Creating tensors

use tenferro_runtime::{Tensor, TypedTensor};
use tenferro_tensor::Rank;

// Dynamic dtype (`Tensor`)
let a = Tensor::from_vec_col_major(vec![2, 3], vec![1.0_f64, 2.0, 3.0, 4.0, 5.0, 6.0]);

// Static dtype (`TypedTensor`)
let b = TypedTensor::<f64>::from_vec_col_major(vec![2, 3], vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);
let ranked: TypedTensor<f64, Rank<2>> = match b.clone().try_into_rank::<2>() {
    Ok(ranked) => ranked,
    Err(err) => panic!("unexpected rank mismatch: {err}"),
};
assert_eq!(ranked.shape(), &[2, 3]);
assert!(b.clone().try_into_rank::<3>().is_err());

// Convert between layers for a specific dtype.
let c = Tensor::F64(b.clone());
assert_eq!(c.shape(), &[2, 3]);

The flat buffers above are in column-major order, so a [2, 3] tensor stores its columns as [1, 2], [3, 4], and [5, 6]. Owned tensors stay compact column-major. Metadata-only strided views live on TypedTensorView and TypedTensorViewMut; operations that require compact storage may copy a view into compact storage on the same device, but they do not silently upload CPU tensors or download CUDA tensors.

Arithmetic

use tenferro_cpu::CpuBackend;
use tenferro_runtime::{Tensor, TensorOpsExt};

let mut backend = CpuBackend::new();
let a = Tensor::from_vec_col_major(vec![3], vec![1.0_f64, 2.0, 3.0]);
let b = Tensor::from_vec_col_major(vec![3], vec![4.0_f64, 5.0, 6.0]);

let sum = a.add(&b, &mut backend).unwrap();
let product = a.mul(&b, &mut backend).unwrap();
let negated = a.neg(&mut backend).unwrap();

assert_eq!(sum.as_slice::<f64>().unwrap(), &[5.0, 7.0, 9.0]);
assert_eq!(product.as_slice::<f64>().unwrap(), &[4.0, 10.0, 18.0]);
assert_eq!(negated.as_slice::<f64>().unwrap(), &[-1.0, -2.0, -3.0]);

Linear algebra

use tenferro_linalg::LinalgBackend;
use tenferro_cpu::CpuBackend;
use tenferro_runtime::{Tensor, TensorOpsExt};

let mut backend = CpuBackend::new();
let a = Tensor::from_vec_col_major(vec![3, 3], vec![
    2.0_f64, 1.0, 0.0,
    1.0, 3.0, 1.0,
    0.0, 1.0, 2.0,
]);

// SVD
let svd = LinalgBackend::svd(&mut backend, &a).unwrap();

// QR
let qr = LinalgBackend::qr(&mut backend, &a).unwrap();

// Cholesky (for positive definite matrices)
let chol = LinalgBackend::cholesky(&mut backend, &a).unwrap();

// Eigendecomposition (symmetric)
let eigh = LinalgBackend::eigh(&mut backend, &a).unwrap();

// Solve Ax = b
let b = Tensor::from_vec_col_major(vec![3], vec![1.0_f64, 2.0, 3.0]);
let x = LinalgBackend::solve(&mut backend, &a, &b).unwrap();

let s = &svd[1];
assert_eq!(s.shape(), &[3]);
assert_eq!(qr[0].shape(), &[3, 3]);
assert_eq!(chol.shape(), &[3, 3]);
let eigenvalues = &eigh[0];
let eigenvectors = &eigh[1];
assert_eq!(eigenvalues.shape(), &[3]);
assert_eq!(eigenvectors.shape(), &[3, 3]);
assert_eq!(x.shape(), &[3]);

Shape operations

use tenferro_cpu::CpuBackend;
use tenferro_runtime::{Tensor, TensorOpsExt};

let mut backend = CpuBackend::new();
let a = Tensor::from_vec_col_major(vec![2, 3], vec![1.0_f64, 2.0, 3.0, 4.0, 5.0, 6.0]);

// Transpose
let at = a.transpose(&[1, 0], &mut backend).unwrap();
assert_eq!(at.shape(), &[3, 2]);

// Reshape
let flat = a.reshape(&[6], &mut backend).unwrap();
assert_eq!(flat.shape(), &[6]);

// Reduce
let col_sum = a.reduce_sum(&[0], &mut backend).unwrap();
assert_eq!(col_sum.shape(), &[3]);

The reduce_sum(&[0]) call removes axis 0. For this [2, 3] tensor, that means summing down each column and keeping one value per column.

Einsum

Use tenferro_einsum::EagerEinsumExt when working with EagerTensor. For traced graph execution, use tenferro_einsum::GraphCompilerEinsumExt and register tenferro_einsum::register_runtime on the GraphExecutor.

Extracting data

use tenferro_runtime::Tensor;

let t = Tensor::from_vec_col_major(vec![3], vec![1.0_f64, 2.0, 3.0]);
let data: &[f64] = t.as_slice::<f64>().unwrap();
assert_eq!(data, &[1.0, 2.0, 3.0]);

Column-major storage

tenferro stores tensors in column-major (Fortran) order. For a [2, 3] tensor with data [1, 2, 3, 4, 5, 6], the layout is:

Column 0: [1, 2]
Column 1: [3, 4]
Column 2: [5, 6]

This matches Fortran, Julia, and MATLAB conventions but differs from C/NumPy row-major order.

Eager Forward And Reverse-Mode Gradients

Eager tensors always compute the forward value immediately. Tracked eager tensors also support reverse-mode autodiff on scalar losses with accumulation. Repeated backward() calls add to the existing gradients, and you clear them explicitly when you want a fresh pass.

use tenferro_ad::{EagerRuntime, EagerTensor, Tensor};
use tenferro_cpu::CpuBackend;

let ctx = EagerRuntime::with_cpu_backend(CpuBackend::new());
let x = EagerTensor::requires_grad_in(Tensor::from_vec_col_major(vec![2], vec![1.0_f64, 2.0]).unwrap(), ctx.clone()).unwrap();
let y = EagerTensor::requires_grad_in(Tensor::from_vec_col_major(vec![2], vec![3.0_f64, 4.0]).unwrap(), ctx.clone()).unwrap();

let loss = x.mul(&y).unwrap().reduce_sum(&[0]).unwrap();
loss.backward().unwrap();
assert_eq!(x.grad().unwrap().unwrap().as_slice::<f64>().unwrap(), &[3.0, 4.0]);

let loss = x.mul(&y).unwrap().reduce_sum(&[0]).unwrap();
loss.backward().unwrap();
assert_eq!(x.grad().unwrap().unwrap().as_slice::<f64>().unwrap(), &[6.0, 8.0]);

x.clear_grad().unwrap();
assert!(x.grad().unwrap().is_none());

let loss = x.mul(&y).unwrap().reduce_sum(&[0]).unwrap();
loss.backward().unwrap();
assert_eq!(x.grad().unwrap().unwrap().as_slice::<f64>().unwrap(), &[3.0, 4.0]);

ctx.clear_grads().unwrap();
assert!(x.grad().unwrap().is_none());
assert!(y.grad().unwrap().is_none());

matmul participates in the same eager reverse-mode workflow:

use tenferro_ad::{EagerRuntime, EagerTensor, Tensor};
use tenferro_cpu::CpuBackend;

let ctx = EagerRuntime::with_cpu_backend(CpuBackend::new());
let a = EagerTensor::requires_grad_in(
    Tensor::from_vec_col_major(vec![2, 2], vec![1.0_f64, 2.0, 3.0, 4.0]).unwrap(),
    ctx.clone(),
).unwrap();
let x = EagerTensor::requires_grad_in(
    Tensor::from_vec_col_major(vec![2, 1], vec![5.0_f64, 6.0]).unwrap(),
    ctx.clone(),
).unwrap();

let y = a.matmul(&x).unwrap();
assert_eq!(y.materialized().unwrap().as_slice::<f64>().unwrap(), &[23.0, 34.0]);

let loss = y.mul(&y).unwrap().reduce_sum(&[0, 1]).unwrap();
assert_eq!(loss.materialized().unwrap().as_slice::<f64>().unwrap(), &[1685.0]);

loss.backward().unwrap();
assert_eq!(x.grad().unwrap().unwrap().as_slice::<f64>().unwrap(), &[182.0, 410.0]);

When To Use Each Immediate Layer

Scenario Recommended
Fixed scalar type and no autodiff TypedTensor<T, R>
Dynamic dtype and no autodiff Tensor + a backend
Data preprocessing Tensor + a backend
Tight inner loops Direct/eager execution
Exploratory computation Direct/eager execution
Immediate forward execution through one runtime EagerTensor
Need reverse-mode gradients on scalar losses tracked EagerTensor variables + backward()
Need grad / vjp / jvp / HVP via composition on traced graphs Lazy traced (TracedTensor + GraphCompiler + GraphExecutor<B>)
CUDA execution for supported operations Eager (Tensor / EagerTensor) or lazy traced (TracedTensor + GraphExecutor<B>) with explicit upload/download