Skip to main content

Crate strided_perm

Crate strided_perm 

Source
Expand description

Cache-efficient tensor permutation / transpose.

This crate provides optimized copy and permutation operations for strided multidimensional arrays. It is designed as a single-responsibility crate sitting between strided-view (data structures) and strided-kernel (general map/reduce/broadcast operations).

§Dependency graph

strided-view -> strided-perm -> strided-kernel -> strided-einsum2

Re-exports§

pub use copy::copy_into;
pub use copy::copy_into_col_major;
pub use copy::try_fuse_group;
pub use fuse::compress_dims;
pub use fuse::compute_costs;
pub use fuse::compute_importance;
pub use fuse::fuse_dims;
pub use fuse::sort_by_importance;
pub use kernel::build_plan_fused;
pub use kernel::build_plan_fused_small;
pub use kernel::for_each_inner_block_preordered;
pub use kernel::total_len;
pub use kernel::KernelPlan;
pub use kernel::SMALL_TENSOR_THRESHOLD;
pub use order::compute_order;

Modules§

block
Block size computation ported from Strided.jl
copy
Copy/permutation operations on strided views.
fuse
Dimension fusion logic ported from Strided.jl/src/mapreduce.jl
hptt
HPTT-faithful cache-efficient tensor permutation.
kernel
Block-based iteration engine for strided permutation operations.
order
Loop ordering algorithm ported from Strided.jl

Constants§

BLOCK_MEMORY_SIZE
CACHE_LINE_SIZE