Advanced retrieval (preview)

Faster recall.
Smaller memory.

Two preview features ship in 5.0 as opt-in extensions to the memory subsystem. They cover the case where the standard knowledge-base path is already working well and you want to push further on latency at scale, on storage footprint, or on routing the right query to the right model with less LLM time.

Compressed memory

The compressed-memory path.

The matrix-memory layer that backs Eldric's knowledge bases is a single-step associative recall mechanism. It's a known good design — every retrieval is one direct lookup against the stored pattern set. The preview adds a compressed variant of the same mechanism: store the patterns in a smaller form, retrieve faster, lose a little accuracy on the hardest queries.

What you get

Smaller .emm files on disk — meaningful for tenants with millions of stored patterns.
Faster recall at high concurrency — sub-millisecond on the hot tier; up to several-times faster when many parallel queries hit the same knowledge base.
Two-tier retrieval available — keep the full-precision matrix for archival accuracy and run the compressed variant as the hot tier in front of it.

The trade-off

A small accuracy hit on the hardest queries — typically 1–3 % on the benchmarks the research community uses for these techniques. For most customer workloads (search through a curated knowledge base, retrieve a few good candidates for the LLM to read) the hit is invisible. For workloads that pivot on exact-rare-pattern matching, run a verification pass against the full-precision matrix.

How to enable

Admin Console → Knowledge Bases → pick a KB → Advanced retrieval → enable compressed memory. The conversion runs in the background; the original full-precision file is preserved until you confirm the compressed version is working for you. Standard tier and above. Per-knowledge-base opt-in, never on by default.

Distilled router

A smaller classifier in front of the LLM.

The router currently uses a small LLM to classify intent, theme and target backend on every request. The distilled router (preview) replaces that with a single-pass neural classifier trained from the LLM's own past decisions on your cluster. The result: lower latency on the routing decision, less GPU time burned on the choice rather than the answer, and a routing model you can train on your own traffic patterns.

What you get

Lower router latency — typically a few milliseconds versus the LLM's tens-to-hundreds of milliseconds.
Frees up the LLM for actual work rather than routing classification.
Trained on your traffic — the distillation step uses your cluster's actual routing decisions as training data, so the classifier matches your patterns, not a generic baseline.

The trade-off

A small classifier doesn't reason about ambiguous edge cases the way a small LLM does. The router falls back to the LLM path for queries where the classifier reports low confidence; the customer pays the latency only on the genuinely ambiguous fraction.

Status

Preview. Available as an opt-in for Professional and Enterprise tiers. Bring-up is admin-driven; the distillation step uses your accumulated routing data and runs as a one-time training job before the classifier replaces the LLM-routing path.

Honest scope

What these are not.

Not on by default. The standard knowledge-base path stays the default for every tenant.
Not a replacement for the full-precision matrix. The compressed version sits alongside the full path; customers opt in per knowledge base.
Not 5.0-GA-quality features. Both ship as preview; the quality marker stays preview through the 5.0 line. Production-grade status arrives in a later 5.0.x patch after a broader customer trial.
Local training only. Both the compression step and the router distillation run on your own hardware. Eldric does not forward your training jobs anywhere external.

Faster recall.Smaller memory.

The compressed-memory path.

What you get

The trade-off

How to enable

A smaller classifier in front of the LLM.

What you get

The trade-off

Status

What these are not.

Next.

Faster recall.
Smaller memory.