Module bgemm_faer

Expand description

Batched GEMM backend using the [faer] library. faer-backed batched GEMM kernel on strided views.

Uses faer::linalg::matmul::matmul for SIMD-optimized matrix multiplication. When dimension groups cannot be fused into 2D matrices (non-contiguous strides), copies operands to contiguous column-major buffers before calling faer.