Expand description
Tensor primitive execution families for the tenferro workspace.
The crate is organized around focused backend contracts instead of a single monolithic descriptor surface:
TensorSemiringCorefor the minimal semiring substrate used bytenferro-einsumTensorSemiringFastPathfor optional semiring performance paths such as contraction fast pathsTensorScalarPrimsfor standard scalar pointwise and reduction familiesTensorAnalyticPrimsfor analytic pointwise and reduction familiesTensorComplexRealPrimsfor cross-dtype complex-to-real unary familiesTensorComplexScalePrimsfor complex payload scaled by real-valued tensorsTensorMetadataPrimsfor integer/bool metadata tensor families with overwrite-based execution and erased metadata tensor handlesTensorMetadataCastPrimsfor metadata-to-scalar bridge families such as bool/int casts andwhereTensorRngPrimsfor dense eager RNG constructors such asrandandrandnTensorIndexingPrimsfor index-based selection, gathering, and scatteringTensorSortPrimsfor sort, argsort, and top-k operations
Most families follow the same plan/execute pattern:
- Create a family descriptor
- Build a backend plan for concrete tensor shapes
- Execute the plan with BLAS-style
alpha/betascaling
TensorMetadataPrims is the exception: it uses overwrite-based execution
over erased integer/bool metadata tensor handles instead of scalar-family
scaling.
§CPU GEMM backend selection
BatchedGemm on CpuBackend requires exactly one CPU GEMM backend feature:
gemm-faer(default): pure-Rust faer matmul backendgemm-blas: CBLAS backend (cblas-sys) with selectable symbol provider
If gemm-blas is selected, choose exactly one provider:
provider-src: link BLAS source crates (blas-src+cblas-src)provider-inject: link runtime-injected symbols (cblas-inject)
With provider-src, choose exactly one src-* implementation:
src-openblas, src-netlib, src-accelerate, src-r,
src-intel-mkl-dynamic-sequential, src-intel-mkl-dynamic-parallel,
src-intel-mkl-static-sequential, src-intel-mkl-static-parallel.
Example (OpenBLAS source provider):
cargo test -p tenferro-prims --no-default-features --features "gemm-blas,provider-src,src-openblas"
Example (runtime-injected provider):
cargo test -p tenferro-prims --no-default-features --features "gemm-blas,provider-inject"
On CpuBackend, semiring-core BatchedGemm supports f32, f64,
Complex32, and Complex64.
§Examples
§Semiring core planning
use tenferro_algebra::Standard;
use tenferro_device::LogicalMemorySpace;
use tenferro_prims::{CpuBackend, CpuContext, SemiringCoreDescriptor, TensorSemiringCore};
use tenferro_tensor::{MemoryOrder, Tensor};
let mut ctx = CpuContext::new(4);
let col = MemoryOrder::ColumnMajor;
let mem = LogicalMemorySpace::MainMemory;
let a = Tensor::<f64>::zeros(&[3, 4], mem, col).unwrap();
let b = Tensor::<f64>::zeros(&[4, 5], mem, col).unwrap();
let mut c = Tensor::<f64>::zeros(&[3, 5], mem, col).unwrap();
let desc = SemiringCoreDescriptor::BatchedGemm {
batch_dims: vec![],
m: 3,
n: 5,
k: 4,
};
let plan = <CpuBackend as TensorSemiringCore<Standard<f64>>>::plan(
&mut ctx,
&desc,
&[&[3, 4], &[4, 5], &[3, 5]],
)
.unwrap();
<CpuBackend as TensorSemiringCore<Standard<f64>>>::execute(
&mut ctx,
&plan,
1.0,
&[&a, &b],
0.0,
&mut c,
)
.unwrap();§Scalar family planning
use tenferro_algebra::Standard;
use tenferro_device::LogicalMemorySpace;
use tenferro_prims::{
CpuBackend, CpuContext, ScalarPrimsDescriptor, ScalarReductionOp, TensorScalarPrims,
};
use tenferro_tensor::{MemoryOrder, Tensor};
let mut ctx = CpuContext::new(4);
let col = MemoryOrder::ColumnMajor;
let mem = LogicalMemorySpace::MainMemory;
let a = Tensor::<f64>::zeros(&[3, 4], mem, col).unwrap();
let mut c = Tensor::<f64>::zeros(&[3], mem, col).unwrap();
let desc = ScalarPrimsDescriptor::Reduction {
modes_a: vec![0, 1],
modes_c: vec![0],
op: ScalarReductionOp::Sum,
};
let plan = <CpuBackend as TensorScalarPrims<Standard<f64>>>::plan(
&mut ctx,
&desc,
&[&[3, 4], &[3]],
)
.unwrap();Modules§
- tensor_
ops - GPU-generic free functions for tensor data operations.
Structs§
- Backend
Registry - Registry of available compute backends.
- CpuBackend
- CPU backend using strided-kernel and GEMM.
- CpuContext
- CPU execution context.
- Cuda
Backend - CUDA backend (stub) — placeholder when
cudafeature is not enabled. - Cuda
Context - CUDA execution context (stub).
- Cuda
Plan - CUDA plan (stub) — placeholder when
cudafeature is not enabled. - Plan
Cache - Cache for pre-computed execution plans.
- Rocm
Backend - ROCm backend using hipTENSOR via runtime dlopen.
- Rocm
Context - ROCm execution context.
- Rocm
Plan - ROCm plan — wraps a hipTENSOR plan handle.
Enums§
- Analytic
Binary Op - Analytic binary operations.
- Analytic
Prims Descriptor - Descriptor for analytic-pointwise and analytic-reduction planning.
- Analytic
Reduction Op - Analytic reduction operations.
- Analytic
Unary Op - Analytic unary operations.
- Complex
Real Prims Descriptor - Descriptor for complex-to-real planning.
- Complex
Real Unary Op - Cross-dtype complex-to-real unary operations.
- Complex
Scale Prims Descriptor - Cross-dtype complex-by-real pointwise operations.
- CpuPlan
- CPU plan — concrete enum, no type erasure.
- Indexing
Prims Descriptor - Descriptor for indexing-family planning.
- Metadata
Binary Op - Integer/bool metadata binary operations.
- Metadata
Cast Prims Descriptor - Metadata-to-scalar bridge planning operations.
- Metadata
Constant Value - Constant payload for metadata tensor generation.
- MetadataD
Type - Metadata tensor dtypes.
- Metadata
Generate Op - Metadata tensor generation operations.
- Metadata
Prims Descriptor - Descriptor for metadata tensor planning.
- Metadata
Reduction Op - Integer/bool metadata reduction operations.
- Metadata
Scalar Tensor Ref - Erased inputs for metadata-to-scalar bridge execution.
- Metadata
Tensor Mut - Erased mutable metadata tensor reference.
- Metadata
Tensor Ref - Erased immutable metadata tensor reference.
- Metadata
Ternary Op - Integer/bool metadata ternary operations.
- RngPrims
Descriptor - Random-number generation descriptors for dense eager tensor construction.
- Scalar
Binary Op - Pointwise scalar binary operations.
- Scalar
Prims Descriptor - Descriptor for scalar-pointwise and scalar-reduction planning.
- Scalar
Reduction Op - Scalar reduction operations.
- Scalar
Ternary Op - Pointwise scalar ternary operations.
- Scalar
Unary Op - Pointwise scalar unary operations.
- Scatter
Reduction - Reduction mode for scatter operations.
- Semiring
Binary Op - Semiring-valid optional binary fast-path operations.
- Semiring
Core Descriptor - Descriptor for semiring-core execution operations.
- Semiring
Fast Path Descriptor - Descriptor for optional semiring fast paths.
- Sort
Prims Descriptor - Descriptor for sort-family planning.
Traits§
- Tensor
Analytic Prims - Analytic pointwise and reduction protocol family.
- Tensor
Complex Real Context For - Bridge trait that binds a complex-to-real execution context to its backend.
- Tensor
Complex Real Prims - Cross-dtype complex-to-real unary protocol family.
- Tensor
Complex Scale Context For - Bridge trait that binds a complex-by-real execution context to its backend.
- Tensor
Complex Scale Prims - Cross-dtype complex-by-real pointwise family.
- Tensor
Indexing Context For - Bridge trait that binds an indexing-family execution context to its backend.
- Tensor
Indexing Prims - Indexing execution protocol family.
- Tensor
Metadata Cast Prims - Metadata-to-scalar bridge protocol.
- Tensor
Metadata Context For - Bridge trait that binds a metadata execution context to its backend.
- Tensor
Metadata Prims - Metadata tensor planning and execution protocol.
- Tensor
Resolve Conj Context For - Bridge trait for backend-specific lazy-conjugation resolution.
- Tensor
RngPrims - Tensor RNG execution family.
- Tensor
Scalar Context For - Bridge trait that binds a scalar-family execution context to its backend.
- Tensor
Scalar Prims - Scalar pointwise and reduction protocol family.
- Tensor
Semiring Context For - Bridge trait that binds a semiring execution context to its backend.
- Tensor
Semiring Core - Minimal semiring execution protocol.
- Tensor
Semiring Fast Path - Optional semiring performance paths.
- Tensor
Sort Context For - Bridge trait that binds a sort-family execution context to its backend.
- Tensor
Sort Prims - Sort execution protocol family.
Type Aliases§
- CpuRng
Plan - CPU execution plan for the RNG family.