CPU Benchmarks
The CPU benchmark suite is a normal reporting benchmark, not a publication gate. Use it to collect comparable timings before deciding which cases should become release or publication thresholds.
Recent result snapshots:
Tenferro Benchmarks
Run the Rust Criterion benchmarks through the wrapper script:
bash scripts/bench-cpu.sh --kind tenferroThe script runs thread counts 1,2,4 by default. Override them with:
bash scripts/bench-cpu.sh --kind tenferro --threads 1,4The wrapper pins common CPU thread controls for each run:
TENFERRO_BENCH_THREADSRAYON_NUM_THREADSOMP_NUM_THREADSOPENBLAS_NUM_THREADSMKL_NUM_THREADSVECLIB_MAXIMUM_THREADS
Criterion setup creates input tensors outside the measured closure. AD cases use iter_batched so tensor data preparation happens in setup while the measured closure covers forward graph construction and backward execution.
LibTorch C++ Baseline
The canonical Torch baseline is the LibTorch C++ benchmark:
bash scripts/bench-cpu.sh --kind torch-cppThis avoids Python dispatch overhead in small-matrix latency cases. The script uses the official CPU-only LibTorch ZIP distribution and builds only the small benchmark binary locally. It does not build PyTorch from source.
By default, the script stores downloaded LibTorch files under the main worktree, not under linked git worktrees:
third_party/libtorch/
This avoids one LibTorch download per temporary worktree. Set TENFERRO_BENCH_DEPS_DIR to choose a different cache directory, or set LIBTORCH_DIR to point at an existing LibTorch installation.
The default LibTorch URL can be overridden:
LIBTORCH_URL=https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcpu.zip \
bash scripts/bench-cpu.sh --kind torch-cppScope
The initial CPU suite covers:
- matmul and matmul-like einsum for small and medium/large sizes
svd,qr,eigh, andsolve- batched small einsum with the batch index on the right
- representative N-ary einsum patterns
- AD for
sum(matmul),sum(svd(A).S), andsum(solve(A, b)) f64as the primary dtype, with selectedc64cases
The sum(svd(A).S) AD benchmark uses the same square sizes as the primal SVD benchmark: 4x4, 8x8, 16x16, 32x32, 64x64, and 128x128.
GPU benchmarks and hard threshold comparisons are intentionally deferred.