pub fn build_plan_fused_small( dims: &[usize], strides_list: &[&[isize]], ) -> (Vec<usize>, Vec<Vec<isize>>, KernelPlan)
Simplified plan for small tensors that fit in L1 cache.