A preview feature in 5.0 wires the inference daemon to your knowledge-base memory directly. Before the model answers, it pulls relevant context from your installation's matrix memory; the answer is grounded in your own data without a separate retrieval round-trip. Opt-in per request, sub-2 ms per-token overhead, Pro+ tier.
The standard pattern for grounding a model in your own data is two steps: a retrieval call (knowledge-base search) feeds the model the relevant context, then the model answers. Smart memory inference collapses that into one: the inference daemon already has access to the matrix-memory layer of the knowledge base your tenant points at; at the prompt boundary, the relevant patterns are consulted and merged into the model's decoding state before it produces the first token.
What changes for the user:
On our demo cluster (CPU-only on the controller host), the recall lookup adds 1–2 milliseconds per token at typical knowledge-base sizes. For a 300-token answer, that's an extra fraction of a second of decode time. For workloads on a GPU-equipped inference node, the recall happens in parallel with model compute and the overhead is effectively hidden.
The lookup scales with knowledge-base size sublinearly — the matrix-memory layer is one matrix-vector product against the stored pattern set, not a search across vectors. Larger knowledge bases cost more per lookup, but the relationship is gentle.
Two operating modes ship in 5.0:
Two paths:
smart_memory: true on the chat-completion request body to opt that specific call in. Useful when you want the feature only for certain workloads (long-form generation, customer-specific reports) and not for short generic queries.Default is off everywhere. Enabling the feature does not change the model, the prompt, or the response shape — only the decoder's view of your data.
For the other memory-layer preview features (compressed memory, distilled router), read advanced retrieval. For the platform's overall data posture, read your data. To install: get started. Questions: office@eldric.ai.