Planned C-API (FFI)
Design notes for a future C-API for Julia, Python (JAX, PyTorch), and other languages. The current workspace does not contain a tenferro-capi crate, so function names in this document are proposed ABI names rather than current public API.
Position in Workspace Architecture
The planned tenferro-capi crate would sit at Layer 5, the topmost layer:
Layer 5: tenferro-capi ← this crate
Layer 4: tenferro-einsum, tenferro-linalg
Layer 3: tenferro-internal-ops
Layer 2: tenferro-tensor
Dependencies: tenferro-tensor, tenferro-einsum, tenferro-linalg, and the runtime/backend crates required by those APIs.
Tropical extension: tenferro-ext-tropical-capi (separate shared library) reuses TfeTensorF64 handles and adds tropical einsum functions.
Error mapping for unsupported AD modes: AutodiffError::ModeNotSupported maps to TFE_INVALID_ARGUMENT in the C-API. No _frule_<algebra>_f64 functions are provided for tropical einsum (consistent with issue #66); if a caller were to invoke such a path internally, it would surface as TFE_INVALID_ARGUMENT with a descriptive message, not a panic or TFE_INTERNAL_ERROR.
Design Principles
Opaque pointers.
TfeTensorF64is an opaque#[repr(C)]handle. Host languages never see Rust internals.Status codes. Every function takes
*mut tfe_status_tas its last argument. Rust panics are caught withcatch_unwindand converted toTFE_INTERNAL_ERROR.Stateless AD only. Only
rrule(VJP) andfrule(JVP) are exposed.TrackedValue/DualValue/Tapeare Rust-internal. Host languages manage their own AD tapes (ChainRules.jl, PyTorch autograd, JAX custom_vjp).f64 only in POC phase. All functions carry the
_f64suffix.DLPack v1.0. Zero-copy tensor exchange across language boundaries (NumPy, PyTorch, JAX, DLPack.jl). Export preserves CPU/GPU logical memory spaces; import currently accepts only
KDLCPUwithdevice_id = 0.Copy semantics for convenience.
tfe_tensor_f64_from_datacopies caller data. For zero-copy, use DLPack.
Status Codes
pub type tfe_status_t = i32;
pub const TFE_SUCCESS: tfe_status_t = 0;
pub const TFE_INVALID_ARGUMENT: tfe_status_t = -1;
pub const TFE_SHAPE_MISMATCH: tfe_status_t = -2;
pub const TFE_INTERNAL_ERROR: tfe_status_t = -3;API Surface
Tensor Lifecycle
| Function | Description |
|---|---|
tfe_tensor_f64_from_data(data, len, shape, ndim, status) |
Create tensor from data (copy) |
tfe_tensor_f64_zeros(shape, ndim, status) |
Create zero-filled tensor |
tfe_tensor_f64_clone(tensor, status) |
Deep-copy (not Arc clone) |
tfe_tensor_f64_release(tensor) |
Free tensor (null-safe) |
tfe_tensor_f64_ndim(tensor) |
Query rank |
tfe_tensor_f64_shape(tensor, out_shape) |
Query shape into buffer |
tfe_tensor_f64_len(tensor) |
Query total elements |
tfe_tensor_f64_data(tensor) |
Raw data pointer (read-only) |
DLPack Interop
| Function | Description |
|---|---|
tfe_tensor_f64_to_dlpack(tensor, status) |
Export (consumes handle) |
tfe_tensor_f64_from_dlpack(managed, status) |
Import (takes ownership) |
Export: tensor handle is consumed — do not call release on it. The returned DLManagedTensorVersioned must be consumed by the host language, which calls the deleter callback when done.
Import: takes ownership of DLManagedTensorVersioned. The deleter is called exactly once, either when the returned tensor is released or immediately if import validation rejects the input.
Einsum
| Function | Description |
|---|---|
tfe_einsum_f64(subscripts, operands, n, status) |
Einsum (returns new tensor) |
tfe_einsum_vjp_f64(subscripts, operands, n, cotangent, grads_out, status) |
Proposed reverse-mode AD |
tfe_einsum_jvp_f64(subscripts, primals, n, tangents, status) |
Proposed forward-mode AD |
Reverse-mode AD: grads_out is a caller-provided array of n pointers. Each returned gradient tensor must be released by the caller.
Forward-mode AD: tangents elements may be null (zero tangent for that operand).
SVD
| Function | Description |
|---|---|
tfe_svd_f64(tensor, left, left_len, right, right_len, max_rank, cutoff, u_out, s_out, vt_out, status) |
SVD with dimension indices |
tfe_svd_rrule_f64(tensor, left, left_len, right, right_len, max_rank, cutoff, cot_u, cot_s, cot_vt, status) |
Reverse-mode AD |
tfe_svd_frule_f64(tensor, left, left_len, right, right_len, max_rank, cutoff, tangent, u_out, s_out, vt_out, status) |
Forward-mode AD |
SVD takes left/right dimension indices (not a pre-matricized matrix). The capi handles permute+reshape+contiguous internally. Set max_rank=0 for no rank limit, cutoff<0 for no cutoff.
rrule cotangent pointers may be null (zero cotangent for that output). frule tangent pointer may be null.
Memory Ownership
| Allocation | Freed by |
|---|---|
Tensor from _from_data / _zeros / _clone |
tfe_tensor_f64_release |
Tensor from _from_dlpack |
tfe_tensor_f64_release (calls DLPack deleter) |
Output tensor (via **_out) |
tfe_tensor_f64_release |
| Gradient tensor (rrule output) | tfe_tensor_f64_release |
grads_out array (einsum rrule) |
Caller provides buffer |
Input data pointer |
Caller (data is copied) |
DLManagedTensorVersioned from _to_dlpack |
Consumer calls deleter |
DLPack v1.0 Types
#[repr(C)]
pub struct DLPackVersion { pub major: u32, pub minor: u32 }
#[repr(C)]
pub struct DLDevice { pub device_type: i32, pub device_id: i32 }
#[repr(C)]
pub struct DLDataType { pub code: u8, pub bits: u8, pub lanes: u16 }
#[repr(C)]
pub struct DLTensor {
pub data: *mut c_void,
pub device: DLDevice,
pub ndim: i32,
pub dtype: DLDataType,
pub shape: *mut i64,
pub strides: *mut i64, // tenferro exports explicit column-major strides
pub byte_offset: u64,
}
#[repr(C)]
pub struct DLManagedTensorVersioned {
pub version: DLPackVersion,
pub manager_ctx: *mut c_void,
pub deleter: Option<unsafe extern "C" fn(*mut DLManagedTensorVersioned)>,
pub flags: u64,
pub dl_tensor: DLTensor,
}Device type constants:
| Constant | Value | Description |
|---|---|---|
KDLCPU |
1 | CPU |
KDLCUDA |
2 | CUDA GPU memory |
KDLCUDA_HOST |
3 | Pinned CUDA CPU memory |
KDLROCM |
10 | ROCm GPU memory |
KDLROCM_HOST |
11 | Pinned ROCm CPU memory |
KDLCUDA_MANAGED |
13 | CUDA managed/unified memory |
DLPack device types map to LogicalMemorySpace:
| DLPack | LogicalMemorySpace |
|---|---|
KDLCPU |
MainMemory |
KDLCUDA_HOST, KDLROCM_HOST |
PinnedMemory |
KDLCUDA, KDLROCM |
GpuMemory { device_id } |
KDLCUDA_MANAGED |
ManagedMemory |
Current import support is narrower than the full mapping above: tfe_tensor_f64_from_dlpack only accepts KDLCPU with device_id = 0 and float64 dtype today.
Tropical Extension (tenferro-ext-tropical-capi)
Separate shared library reusing TfeTensorF64 handles. Algebra is selected by function name, not tensor type (MaxPlus<f64> is #[repr(transparent)] over f64).
| Function | Algebra |
|---|---|
tfe_tropical_einsum_maxplus_f64 |
MaxPlus (⊕=max, ⊗=+) |
tfe_tropical_einsum_minplus_f64 |
MinPlus (⊕=min, ⊗=+) |
tfe_tropical_einsum_maxmul_f64 |
MaxMul (⊕=max, ⊗=×) |
Each has a corresponding _rrule_<algebra>_f64 function (same signature as standard einsum rrule). No _frule_<algebra>_f64 is provided for tropical einsum: tropical semirings (max/min) are not smooth, so JVP is not well-defined. Only VJP (rrule via argmax) is supported.
The smallest linear index wins tie-break rule applies to all tropical rrule functions: when multiple elements share the maximum (or minimum) value, the gradient is routed to the element with the smallest linear index. This is consistent across CPU and GPU backends.
Design Decisions
SVD takes dimension indices, not matrices. Unlike
tenferro-linalg(which takes pre-matricized(m, n, *)tensors), the C-API takesleft/rightdimension index arrays and handles matricization internally. This is more ergonomic for host languages that work with high-rank tensors.No QR/LU/eigen/solve in C-API. Only SVD is exposed in the POC because it is the primary operation for tensor network applications. Other decompositions can be added as needed.
No tape/tracked types exposed. Host languages already have their own AD systems. Exposing stateless VJP/JVP functions lets each host integrate naturally (Julia ChainRules, PyTorch custom_vjp, JAX custom_vjp).
Separate tropical shared library. Avoids bloating the core
tenferro-capiwith tropical code. Users who don’t need tropical algebra don’t link it.catch_unwindfor panic safety. All extern “C” functions wrap their body incatch_unwindto prevent Rust panics from unwinding through C frames (undefined behavior).
Implementation Phases
Phase 1: Tensor Lifecycle + Status Plumbing
- Implement:
tfe_tensor_f64_from_data,_zeros,_clone,_release,_ndim,_shape,_len,_data - Status plumbing:
catch_unwindwrapper,tfe_status_treturn convention - Exit criteria: All tensor lifecycle functions pass null-safety and round-trip tests
Phase 2: DLPack Import/Export
- Implement:
tfe_tensor_f64_to_dlpack,_from_dlpack - Exit criteria: Round-trip test (Rust -> DLPack -> Rust) preserves data, shape, strides; deleter is called exactly once
Phase 3: Einsum + SVD + AD
- Implement:
tfe_einsum_f64, proposed VJP/JVP variants;tfe_svd_f64, proposed VJP/JVP variants - Exit criteria: Gradient check passes for einsum VJP/JVP; SVD round-trip USVt ~= A
Phase 4: Tropical C-API
- Implement:
tfe_tropical_einsum_maxplus_f64etc. + proposed reverse-mode variants - No forward-mode AD for tropical
- Exit criteria: Tropical einsum matches CPU reference; reverse-mode gradient check passes
Error Mapping
| Rust Error | tfe_status_t |
Notes |
|---|---|---|
tenferro_tensor::Error::InvalidConfig |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_tensor::Error::ShapeMismatch |
TFE_SHAPE_MISMATCH (-2) |
|
tenferro_tensor::Error::RankMismatch |
TFE_SHAPE_MISMATCH (-2) |
Shape/rank grouped |
tenferro_tensor::Error::DTypeMismatch |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_tensor::Error::AxisOutOfBounds |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_tensor::Error::DuplicateAxis |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_tensor::Error::AxisRoleConflict |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_tensor::Error::BackendFailure |
TFE_INTERNAL_ERROR (-3) |
|
tenferro_tensor::Error::MissingValue |
TFE_INTERNAL_ERROR (-3) |
|
tenferro_einsum::Error::InvalidArgument |
TFE_INVALID_ARGUMENT (-1) |
|
tenferro_einsum::Error::ShapeMismatch |
TFE_SHAPE_MISMATCH (-2) |
|
chainrules_core::AutodiffError::ModeNotSupported |
TFE_INVALID_ARGUMENT (-1) |
Tropical frule/hvp |
chainrules_core::AutodiffError::* (other) |
TFE_INTERNAL_ERROR (-3) |
|
Rust panic (caught by catch_unwind) |
TFE_INTERNAL_ERROR (-3) |
Status message: human-readable, non-empty. Written to an internal thread-local buffer accessible via future tfe_last_error_message() API.
C-API Test Matrix
| Category | Test Cases |
|---|---|
| NULL safety | NULL tensor pointer to all query functions; NULL status pointer |
| Invalid shape | Zero-dim shape, empty data with non-zero shape, mismatched len |
| Einsum errors | Invalid subscript string, shape mismatch between operands |
| DLPack | Unsupported dtype, device mismatch, deleter called exactly once |
| Ownership | Double release (must not crash), release after to_dlpack (must not crash) |
| Panic safety | Internal panic caught and converted to TFE_INTERNAL_ERROR |
| AD error paths | Tropical forward-mode AD returns TFE_INVALID_ARGUMENT; NULL cotangent in reverse mode |
See testing.md for the workspace-level testing strategy.
ABI Policy
Header Generation
C headers would be generated via cbindgen from the two planned FFI crates:
cbindgen --config cbindgen.toml --crate tenferro-capi --output tenferro.h
cbindgen \
--config extension/tenferro-ext-tropical-capi/cbindgen.toml \
--crate tenferro-ext-tropical-capi \
--output tenferro_ext_tropical.htenferro_ext_tropical.h is the extension header and includes tenferro.h.
Symbol Naming
All public symbols use the tfe_ prefix. Tropical extension uses tfe_tropical_ prefix.
Versioning
- Shared library versioned as
libtenferro.so.{major}.{minor}.{patch} - ABI compatibility: patch releases are backward-compatible; minor releases may add symbols; major releases may break ABI
- Version query:
tfe_version()returns(major, minor, patch)tuple
Backward Compatibility
- Existing function signatures are frozen after 1.0
- New functions added with new names (no overloading)
- Deprecated functions kept for one major version cycle