Crate strided_kernel

Expand description

Cache-optimized kernels for strided multidimensional array operations.

This crate is a Rust port of Julia’s Strided.jl and StridedViews.jl libraries, providing efficient operations on strided multidimensional array views.

§Core Types

StridedView / StridedViewMut: Dynamic-rank strided views over existing data
StridedArray: Owned strided multidimensional array
ElementOp trait and implementations (Identity, Conj, Transpose, Adjoint): Type-level element operations applied lazily on access

§Primary API (view-based, Julia-compatible)

§Map Operations

map_into: Apply a function element-wise from source to destination
zip_map2_into, zip_map3_into, zip_map4_into: Multi-array element-wise operations

§Reduce Operations

reduce: Full reduction with map function
reduce_axis: Reduce along a single axis

§Basic Operations

copy_into: Copy array contents
add, mul: Element-wise arithmetic
axpy: y = alpha*x + y (array version)
sum, dot: Reductions
symmetrize_into, symmetrize_conj_into: Matrix symmetrization

§Example

use strided_kernel::{StridedView, StridedViewMut, StridedArray, Identity, map_into};

// Create a column-major array (Julia default)
let src = StridedArray::<f64>::from_fn_col_major(&[2, 3], |idx| {
    (idx[0] * 10 + idx[1]) as f64
});
let mut dest = StridedArray::<f64>::col_major(&[2, 3]);

// Map with view-based API
map_into(&mut dest.view_mut(), &src.view(), |x| x * 2.0).unwrap();
assert_eq!(dest.get(&[1, 2]), 24.0); // (1*10 + 2) * 2

§Cache Optimization

The library uses Julia’s blocking strategy for cache efficiency:

Dimensions are sorted by stride magnitude for optimal memory access
Operations are blocked into tiles fitting L1 cache (BLOCK_MEMORY_SIZE = 32KB)
Contiguous arrays use fast paths bypassing the blocking machinery

Modules§

view: Julia-like dynamic-rank strided view types.

Structs§

Adjoint: Adjoint operation: f(x) = adjoint(x) = conj(transpose(x)) For scalar numbers, this is conj.
Conj: Complex conjugate operation: f(x) = conj(x)
CopyPlan: A compiled copy traversal for one (dims, dst_strides, src_strides) layout pair.
FusedInst: One SSA instruction in a FusedPlan.
FusedPlan: Topologically ordered fused elementwise SSA DAG.
Identity: Identity operation: f(x) = x
RawStridedMut: Borrowed raw strided output layout.
RawStridedRef: Borrowed raw strided input layout.
StridedArray: Owned strided multidimensional array.
StridedView: Dynamic-rank immutable strided view with lazy element operations.
StridedViewMut: Dynamic-rank mutable strided view.
Transpose: Transpose operation: f(x) = transpose(x) For scalar numbers, this is identity. For matrix elements, this would transpose each element.

Enums§

FusedOp: Runtime scalar operation for a fused elementwise plan.
StridedError: Errors that can occur during strided array operations.

Constants§

BLOCK_MEMORY_SIZE: Block memory size for cache-optimized iteration (L1 cache target).
CACHE_LINE_SIZE: Cache line size in bytes.
RAW_FUSED_RANK_LIMIT: Maximum rank fused on the stack before falling back to the view kernels.

Traits§

ComposableElementOp: Trait for element operations that support type-level composition.
Compose: Helper trait for composing two ElementOp types.
ElementOp: Trait for element-wise operations applied to strided views.
ElementOpApply: Trait for types that support element operations (conj, transpose, adjoint).
FusedScalar: Scalar types supported by fused_elementwise_into.
MaybeSend: Equivalent to Send when parallel is enabled; blanket-impl otherwise.
MaybeSendSync: Equivalent to Send + Sync when parallel is enabled; blanket-impl otherwise.
MaybeSimdOps: Trait for types that may have SIMD-accelerated sum/dot operations.
MaybeSync: Equivalent to Sync when parallel is enabled; blanket-impl otherwise.

Functions§

add: Element-wise addition: dest[i] += src[i].
axpy: AXPY: dest[i] = alpha * src[i] + dest[i].
axpy_conj_raw: dest = alpha * conj(src) + dest over borrowed raw strided layouts.
axpy_raw: dest = alpha * src + dest over borrowed raw strided layouts.
batched_outer_product_into: Compute dest[lhs_free..., rhs_free..., batch...] = lhs[lhs_free..., batch...] * rhs[rhs_free..., batch...].
broadcast_mul_into: Broadcasted element-wise multiplication: dest[i] = a[i] * b[i].
col_major_strides: Compute column-major strides (Julia default: first index varies fastest).
copy_conj: Copy with complex conjugation: dest[i] = conj(src[i]).
copy_into: Copy elements from source to destination: dest[i] = src[i].
copy_into_col_major: Copy elements from src to dst, optimized for col-major destination.
copy_scale: Copy with scaling: dest[i] = scale * src[i].
copy_scale_conj_raw: dest = scale * conj(src) over borrowed raw strided layouts.
copy_scale_raw: dest = scale * src over borrowed raw strided layouts.
copy_transpose_scale_into: Copy with transpose and scaling: dest[j,i] = scale * src[i,j].
dot: Dot product: sum(OpA::apply(a[i]) * OpB::apply(b[i])).
fma: Fused multiply-add: dest[i] += OpA::apply(a[i]) * OpB::apply(b[i]).
fused_elementwise_into: Evaluate a runtime-DAG elementwise plan into one or more destinations.
map_into: Apply a function element-wise from source to destination.
mul: Element-wise multiplication: dest[i] *= src[i].
mul_into: Element-wise multiplication: dest[i] = a[i] * b[i].
reduce: Full reduction with map function: reduce(init, op, map.(src)).
reduce_axis: Reduce along a single axis, returning a new StridedArray.
row_major_strides: Compute row-major strides (C default: last index varies fastest).
sum: Sum all elements: sum(src).
symmetrize_conj_into: Conjugate-symmetrize a square matrix: dest = (src + conj(src^T)) / 2.
symmetrize_into: Symmetrize a square matrix: dest = (src + src^T) / 2.
zip_map2_into: Binary element-wise operation: dest[i] = f(a[i], b[i]).
zip_map3_into: Ternary element-wise operation: dest[i] = f(a[i], b[i], c[i]).
zip_map4_into: Quaternary element-wise operation: dest[i] = f(a[i], b[i], c[i], e[i]).

Type Aliases§

Result: Result type for strided array operations.

Crate strided_kernel

Crate strided_kernel Copy item path

§Core Types

§Primary API (view-based, Julia-compatible)

§Map Operations

§Reduce Operations

§Basic Operations

§Example

§Cache Optimization

Modules§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate strided_kernel