The structured-ML workloads — policy execution, forecasting, vision-language encoding, associative retrieval — get a native compile path on the 5.0.x train. No Python sidecar, no separate model server. The same single daemon serves them via a compiled graph. Faster cold-start, less memory pressure, fewer moving parts to monitor.
Today
5.0 already ships the structured-ML workloads, but the implementation today routes through a Python runtime for portability. That's fine, and it works, but the overhead is real — cold-start time, memory footprint, an extra process to manage and watchdog.
An upcoming 5.0.x patch ships a native execution path. The same model graph compiles directly into the platform's inference daemon — no Python interpreter on the hot path, no separate model server, no inter-process round-trip. Same API surface, same model files, same workload semantics; what changes is what happens behind the curtain.
On Apple Silicon, the native path runs on the unified-memory architecture without the host-to-GPU transfer cost that the Python runtime imposes. The platform's macOS GUI — and the standalone deployment story on a developer's laptop — gets noticeably faster cold-start and a smaller resident set.
On Linux servers, the win is operational rather than visible: one less process to monitor, one less surface for sysadmins to debug, and a smaller memory ceiling for the same throughput. Customers running the structured-ML workloads at scale get more headroom on the same hardware.
This page updates as each piece lands. The release notes are the formal cut.
For the structured-ML stack today, see xLSTM for IoT. For the inference daemon overall, see smart memory inference. For the full 5.0.x roadmap, see what's next in 5.0.x.