Crate tenferro_prims

Crate tenferro_prims 

Source
Expand description

Tensor primitive execution families for the tenferro workspace.

The crate is organized around focused backend contracts instead of a single monolithic descriptor surface:

Most families follow the same plan/execute pattern:

  1. Create a family descriptor
  2. Build a backend plan for concrete tensor shapes
  3. Execute the plan with BLAS-style alpha/beta scaling

TensorMetadataPrims is the exception: it uses overwrite-based execution over erased integer/bool metadata tensor handles instead of scalar-family scaling.

§CPU GEMM backend selection

BatchedGemm on CpuBackend requires exactly one CPU GEMM backend feature:

  • gemm-faer (default): pure-Rust faer matmul backend
  • gemm-blas: CBLAS backend (cblas-sys) with selectable symbol provider

If gemm-blas is selected, choose exactly one provider:

  • provider-src: link BLAS source crates (blas-src + cblas-src)
  • provider-inject: link runtime-injected symbols (cblas-inject)

With provider-src, choose exactly one src-* implementation: src-openblas, src-netlib, src-accelerate, src-r, src-intel-mkl-dynamic-sequential, src-intel-mkl-dynamic-parallel, src-intel-mkl-static-sequential, src-intel-mkl-static-parallel.

Example (OpenBLAS source provider): cargo test -p tenferro-prims --no-default-features --features "gemm-blas,provider-src,src-openblas"

Example (runtime-injected provider): cargo test -p tenferro-prims --no-default-features --features "gemm-blas,provider-inject"

On CpuBackend, semiring-core BatchedGemm supports f32, f64, Complex32, and Complex64.

§Examples

§Semiring core planning

use tenferro_algebra::Standard;
use tenferro_device::LogicalMemorySpace;
use tenferro_prims::{CpuBackend, CpuContext, SemiringCoreDescriptor, TensorSemiringCore};
use tenferro_tensor::{MemoryOrder, Tensor};

let mut ctx = CpuContext::new(4);
let col = MemoryOrder::ColumnMajor;
let mem = LogicalMemorySpace::MainMemory;
let a = Tensor::<f64>::zeros(&[3, 4], mem, col).unwrap();
let b = Tensor::<f64>::zeros(&[4, 5], mem, col).unwrap();
let mut c = Tensor::<f64>::zeros(&[3, 5], mem, col).unwrap();

let desc = SemiringCoreDescriptor::BatchedGemm {
    batch_dims: vec![],
    m: 3,
    n: 5,
    k: 4,
};
let plan = <CpuBackend as TensorSemiringCore<Standard<f64>>>::plan(
    &mut ctx,
    &desc,
    &[&[3, 4], &[4, 5], &[3, 5]],
)
.unwrap();
<CpuBackend as TensorSemiringCore<Standard<f64>>>::execute(
    &mut ctx,
    &plan,
    1.0,
    &[&a, &b],
    0.0,
    &mut c,
)
.unwrap();

§Scalar family planning

use tenferro_algebra::Standard;
use tenferro_device::LogicalMemorySpace;
use tenferro_prims::{
    CpuBackend, CpuContext, ScalarPrimsDescriptor, ScalarReductionOp, TensorScalarPrims,
};
use tenferro_tensor::{MemoryOrder, Tensor};

let mut ctx = CpuContext::new(4);
let col = MemoryOrder::ColumnMajor;
let mem = LogicalMemorySpace::MainMemory;
let a = Tensor::<f64>::zeros(&[3, 4], mem, col).unwrap();
let mut c = Tensor::<f64>::zeros(&[3], mem, col).unwrap();

let desc = ScalarPrimsDescriptor::Reduction {
    modes_a: vec![0, 1],
    modes_c: vec![0],
    op: ScalarReductionOp::Sum,
};
let plan = <CpuBackend as TensorScalarPrims<Standard<f64>>>::plan(
    &mut ctx,
    &desc,
    &[&[3, 4], &[3]],
)
.unwrap();

Modules§

tensor_ops
GPU-generic free functions for tensor data operations.

Structs§

BackendRegistry
Registry of available compute backends.
CpuBackend
CPU backend using strided-kernel and GEMM.
CpuContext
CPU execution context.
CudaBackend
CUDA backend (stub) — placeholder when cuda feature is not enabled.
CudaContext
CUDA execution context (stub).
CudaPlan
CUDA plan (stub) — placeholder when cuda feature is not enabled.
PlanCache
Cache for pre-computed execution plans.
RocmBackend
ROCm backend using hipTENSOR via runtime dlopen.
RocmContext
ROCm execution context.
RocmPlan
ROCm plan — wraps a hipTENSOR plan handle.

Enums§

AnalyticBinaryOp
Analytic binary operations.
AnalyticPrimsDescriptor
Descriptor for analytic-pointwise and analytic-reduction planning.
AnalyticReductionOp
Analytic reduction operations.
AnalyticUnaryOp
Analytic unary operations.
ComplexRealPrimsDescriptor
Descriptor for complex-to-real planning.
ComplexRealUnaryOp
Cross-dtype complex-to-real unary operations.
ComplexScalePrimsDescriptor
Cross-dtype complex-by-real pointwise operations.
CpuPlan
CPU plan — concrete enum, no type erasure.
IndexingPrimsDescriptor
Descriptor for indexing-family planning.
MetadataBinaryOp
Integer/bool metadata binary operations.
MetadataCastPrimsDescriptor
Metadata-to-scalar bridge planning operations.
MetadataConstantValue
Constant payload for metadata tensor generation.
MetadataDType
Metadata tensor dtypes.
MetadataGenerateOp
Metadata tensor generation operations.
MetadataPrimsDescriptor
Descriptor for metadata tensor planning.
MetadataReductionOp
Integer/bool metadata reduction operations.
MetadataScalarTensorRef
Erased inputs for metadata-to-scalar bridge execution.
MetadataTensorMut
Erased mutable metadata tensor reference.
MetadataTensorRef
Erased immutable metadata tensor reference.
MetadataTernaryOp
Integer/bool metadata ternary operations.
RngPrimsDescriptor
Random-number generation descriptors for dense eager tensor construction.
ScalarBinaryOp
Pointwise scalar binary operations.
ScalarPrimsDescriptor
Descriptor for scalar-pointwise and scalar-reduction planning.
ScalarReductionOp
Scalar reduction operations.
ScalarTernaryOp
Pointwise scalar ternary operations.
ScalarUnaryOp
Pointwise scalar unary operations.
ScatterReduction
Reduction mode for scatter operations.
SemiringBinaryOp
Semiring-valid optional binary fast-path operations.
SemiringCoreDescriptor
Descriptor for semiring-core execution operations.
SemiringFastPathDescriptor
Descriptor for optional semiring fast paths.
SortPrimsDescriptor
Descriptor for sort-family planning.

Traits§

TensorAnalyticPrims
Analytic pointwise and reduction protocol family.
TensorComplexRealContextFor
Bridge trait that binds a complex-to-real execution context to its backend.
TensorComplexRealPrims
Cross-dtype complex-to-real unary protocol family.
TensorComplexScaleContextFor
Bridge trait that binds a complex-by-real execution context to its backend.
TensorComplexScalePrims
Cross-dtype complex-by-real pointwise family.
TensorIndexingContextFor
Bridge trait that binds an indexing-family execution context to its backend.
TensorIndexingPrims
Indexing execution protocol family.
TensorMetadataCastPrims
Metadata-to-scalar bridge protocol.
TensorMetadataContextFor
Bridge trait that binds a metadata execution context to its backend.
TensorMetadataPrims
Metadata tensor planning and execution protocol.
TensorResolveConjContextFor
Bridge trait for backend-specific lazy-conjugation resolution.
TensorRngPrims
Tensor RNG execution family.
TensorScalarContextFor
Bridge trait that binds a scalar-family execution context to its backend.
TensorScalarPrims
Scalar pointwise and reduction protocol family.
TensorSemiringContextFor
Bridge trait that binds a semiring execution context to its backend.
TensorSemiringCore
Minimal semiring execution protocol.
TensorSemiringFastPath
Optional semiring performance paths.
TensorSortContextFor
Bridge trait that binds a sort-family execution context to its backend.
TensorSortPrims
Sort execution protocol family.

Type Aliases§

CpuRngPlan
CPU execution plan for the RNG family.