Two preview features ship in 5.0 as opt-in extensions to the memory subsystem. They cover the case where the standard knowledge-base path is already working well and you want to push further on latency at scale, on storage footprint, or on routing the right query to the right model with less LLM time.
The matrix-memory layer that backs Eldric's knowledge bases is a single-step associative recall mechanism. It's a known good design — every retrieval is one matrix-vector product against the stored pattern set. The preview adds a compressed variant of the same mechanism: store the patterns in a smaller form, retrieve faster, lose a little accuracy on the hardest queries.
A small accuracy hit on the hardest queries — typically 1–3 % on the benchmarks the research community uses for these techniques. For most customer workloads (search through a curated knowledge base, retrieve a few good candidates for the LLM to read) the hit is invisible. For workloads that pivot on exact-rare-pattern matching, run a verification pass against the full-precision matrix.
Admin Console → Knowledge Bases → pick a KB → Advanced retrieval → enable compressed memory. The conversion runs in the background; the original full-precision file is preserved until you confirm the compressed version is working for you. Standard tier and above. Per-knowledge-base opt-in, never on by default.
The router currently uses a small LLM to classify intent, theme and target backend on every request. The distilled router (preview) replaces that with a single-pass neural classifier trained from the LLM's own past decisions on your cluster. The result: lower latency on the routing decision, less GPU time burned on the choice rather than the answer, and a routing model you can train on your own traffic patterns.
A small classifier doesn't reason about ambiguous edge cases the way a small LLM does. The router falls back to the LLM path for queries where the classifier reports low confidence; the customer pays the latency only on the genuinely ambiguous fraction.
Preview. Available as an opt-in for Professional and Enterprise tiers. Bring-up is admin-driven; the distillation step uses your accumulated routing data and runs as a one-time training job before the classifier replaces the LLM-routing path.
For the platform's overall memory model, read how it works. For the data-residency posture, read your data. For the 5.0 GA scope: release notes. Specific questions: office@eldric.ai.