Expand description
Batched GEMM backend using the [faer] library.
faer-backed batched GEMM kernel on strided views.
Uses faer::linalg::matmul::matmul for SIMD-optimized matrix multiplication.
When dimension groups cannot be fused into 2D matrices (non-contiguous strides),
copies operands to contiguous column-major buffers before calling faer.
Functionsยง
- bgemm_
contiguous_ into - Batched GEMM on pre-contiguous operands.
- bgemm_
strided_ into - Batched strided GEMM using faer: C = alpha * A * B + beta * C