Tensor Representation

tenferro-tensor-core and tenferro-tensor split the dense tensor contract along a backend boundary.

Crate Split

tenferro-tensor-core owns backend-independent metadata and host-only adapters:

  • TensorLayout<R>, Rank<N>, DynRank, and TensorRank,
  • ShapeVec, StrideVec, and checked layout validation,
  • DType, TensorScalar, HostTensor<T>, dynamic host Tensor, HostTensorView, TensorView, and TensorRef.

It must not expose a public TypedTensor alias and must not depend on backend buffers, CUDA, BLAS/LAPACK providers, execution traits, runtime caches, or AD.

tenferro-tensor owns runtime/backend-capable tensor values:

  • TypedTensor<T, R = DynRank> for fixed scalar type and optional static rank,
  • dtype-erased dynamic-rank Tensor,
  • TypedTensorView<'a, T, R> and TypedTensorViewMut<'a, T, R> for borrowed strided views,
  • placement metadata, backend buffer handles, TensorBackend, and backend session traits.

CPU execution, CPU kernels, provider selection, and CPU resource pools belong to tenferro-cpu. GPU execution and explicit device transfer helpers belong to tenferro-gpu.

Layout

Owned runtime tensors are compact column-major. The leftmost dimension has stride 1, so compact strides for shape [d0, d1, d2] are [1, d0, d0 * d1].

Arbitrary strides, non-zero offsets, transposes, slices, and reverse layouts belong to views or TensorLayout metadata. Metadata-only transformations use the _view suffix, such as transpose_view and slice_view.

When a compact-only operation receives a view, it may canonicalize that view inside the same placement. Host views can copy to host compact tensors; CUDA views can copy to CUDA compact tensors. Canonicalization is not a CPU/GPU transfer mechanism.

Operation Vocabulary

Unsuffixed operation names take owned compact tensor inputs. APIs that accept borrowed view inputs or TensorRead inputs use a _read suffix. Examples include add_read, reduce_sum_read, and dot_general_read.

Metadata-only APIs that produce views use _view. APIs that allocate, execute kernels, canonicalize buffers, or move data must not use _view.

Device Transfer

tenferro never silently transfers tensor payloads between CPU and GPU. Callers upload CPU tensors before CUDA backend execution and download CUDA tensors before CPU execution or host value inspection.

Result-returning backend APIs report placement mismatches with BackendFailure diagnostics. Direct host-inspection methods that return slices, such as TypedTensor::host_data(), may panic on backend buffers because they cannot return a Result.