Expand description
Cache-efficient tensor permutation / transpose.
This crate provides optimized copy and permutation operations for strided
multidimensional arrays. It is designed as a single-responsibility crate
sitting between strided-view (data structures) and strided-kernel
(general map/reduce/broadcast operations).
§Dependency graph
strided-view -> strided-perm -> strided-kernel -> strided-einsum2Re-exports§
pub use copy::copy_into;pub use copy::copy_into_col_major;pub use copy::try_fuse_group;pub use fuse::compress_dims;pub use fuse::compute_costs;pub use fuse::compute_importance;pub use fuse::fuse_dims;pub use fuse::sort_by_importance;pub use kernel::build_plan_fused;pub use kernel::build_plan_fused_small;pub use kernel::for_each_inner_block_preordered;pub use kernel::total_len;pub use kernel::KernelPlan;pub use kernel::SMALL_TENSOR_THRESHOLD;pub use order::compute_order;
Modules§
- block
- Block size computation ported from Strided.jl
- copy
- Copy/permutation operations on strided views.
- fuse
- Dimension fusion logic ported from Strided.jl/src/mapreduce.jl
- hptt
- HPTT-faithful cache-efficient tensor permutation.
- kernel
- Block-based iteration engine for strided permutation operations.
- order
- Loop ordering algorithm ported from Strided.jl