Tensor Representation
tenferro-tensor-core and tenferro-tensor split the dense tensor contract along a backend boundary.
Crate Split
tenferro-tensor-core owns backend-independent metadata and host-only adapters:
TensorLayout<R>,Rank<N>,DynRank, andTensorRank,ShapeVec,StrideVec, and checked layout validation,DType,TensorScalar,HostTensor<T>, dynamic hostTensor,HostTensorView,TensorView, andTensorRef.
It must not expose a public TypedTensor alias and must not depend on backend buffers, CUDA, BLAS/LAPACK providers, execution traits, runtime caches, or AD.
tenferro-tensor owns runtime/backend-capable tensor values:
TypedTensor<T, R = DynRank>for fixed scalar type and optional static rank,- dtype-erased dynamic-rank
Tensor, TypedTensorView<'a, T, R>andTypedTensorViewMut<'a, T, R>for borrowed strided views,- placement metadata, backend buffer handles,
TensorBackend, and backend session traits.
CPU execution, CPU kernels, provider selection, and CPU resource pools belong to tenferro-cpu. GPU execution and explicit device transfer helpers belong to tenferro-gpu.
Layout
Owned runtime tensors are compact column-major. The leftmost dimension has stride 1, so compact strides for shape [d0, d1, d2] are [1, d0, d0 * d1].
Arbitrary strides, non-zero offsets, transposes, slices, and reverse layouts belong to views or TensorLayout metadata. Metadata-only transformations use the _view suffix, such as transpose_view and slice_view.
When a compact-only operation receives a view, it may canonicalize that view inside the same placement. Host views can copy to host compact tensors; CUDA views can copy to CUDA compact tensors. Canonicalization is not a CPU/GPU transfer mechanism.
Operation Vocabulary
Unsuffixed operation names take owned compact tensor inputs. APIs that accept borrowed view inputs or TensorRead inputs use a _read suffix. Examples include add_read, reduce_sum_read, and dot_general_read.
Metadata-only APIs that produce views use _view. APIs that allocate, execute kernels, canonicalize buffers, or move data must not use _view.
Device Transfer
tenferro never silently transfers tensor payloads between CPU and GPU. Callers upload CPU tensors before CUDA backend execution and download CUDA tensors before CPU execution or host value inspection.
Result-returning backend APIs report placement mismatches with BackendFailure diagnostics. Direct host-inspection methods that return slices, such as TypedTensor::host_data(), may panic on backend buffers because they cannot return a Result.