tenferro

tenferro is a dense tensor computation stack for Rust users who want typed tensor computation, PyTorch-like immediate execution with backward(), JAX-like traced graphs, einsum, linear algebra, and explicit CPU, CUDA, and experimental WebGPU backend control.

The project covers both ordinary tensor computation and autodiff workflows. Start with the smallest API that solves your problem, then add autodiff, graph compilation, CUDA, or experimental WebGPU only when the workflow needs them.

tenferro-rs architecture overview

Where To Start

Workflow Start with
First setup and a checked CPU program Getting Started
Core terms and the three main choices Core Concepts
Step-by-step runnable examples Tutorials
Choosing between TypedTensor, Tensor, EagerTensor, and TracedTensor Choosing a Tensor API
Understanding direct, eager, and traced execution Execution Models
Column-major storage and row-major import/export Memory Order
CPU, CUDA, and experimental WebGPU backend behavior Devices and GPU
Static-shaped StableHLO and PJRT plugin loading XLA and PJRT
Runtime-dependent dimensions in traced graphs Dynamic and symbolic shapes
API documentation for every crate API Reference

First CPU Example

use tenferro_cpu::CpuBackend;
use tenferro_runtime::{Tensor, TensorOpsExt};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut backend = CpuBackend::new();

    let a = Tensor::from_vec_col_major(vec![2, 2], vec![1.0_f64, 3.0, 2.0, 4.0])?;
    let b = Tensor::from_vec_col_major(vec![2, 2], vec![5.0_f64, 7.0, 6.0, 8.0])?;

    let c = a.matmul(&b, &mut backend)?;

    assert_eq!(c.shape(), &[2, 2]);
    assert_eq!(c.as_slice::<f64>().unwrap(), &[19.0, 43.0, 22.0, 50.0]);

    Ok(())
}

Mental Model

tenferro has three independent choices. GPU providers, eager execution, and traced graphs are not competing APIs; they answer different questions.

Choice Question Options
Tensor API What kind of value do I pass around? TypedTensor<T>, Tensor, EagerTensor, TracedTensor
Execution timing When does computation run? Direct backend call, immediate eager execution, traced compile/run
Backend/device Where does computation run? CPU, CUDA, or experimental WebGPU backend, with explicit transfer

Most code without autodiff starts with TypedTensor<T> when the scalar type is known at compile time, or Tensor when dtype must be selected at runtime. EagerTensor adds immediate execution through an EagerRuntime and optional backward() on scalar losses. TracedTensor adds graph compilation, grad, vjp, and jvp on traced graphs, symbolic inputs, and reuse.

Get In Touch

Questions, design discussions, and contributor coordination for tenferro happen in the tenferro Matrix room:

Matrix is an open, federated chat protocol. You can join from a Matrix client such as Element, or through the browser flow opened by the link above.

Use GitHub issues for bug reports, feature requests, and decisions that need tracking; use Matrix for lightweight discussion before filing or implementing changes.