5.1 architecture preview

What lands
in 5.1.

5.0 is the foundation. 5.1 takes four pieces of the platform and makes each one faster, smarter, or more resilient than it is today. This page is the customer-shaped read on those four: what changes for you, why it matters, what the trade-off looks like. Honest framing, no commit dates — we ship when the surface is right. For the briefer 5.1 card view see what's next.

1 · Native inference stack

Faster local models, less moving parts.

Today most of Eldric's local-model traffic flows through Ollama. It works well, but it's a separate process to manage and a separate set of failure modes to debug. 5.1 wires the native Eldric inference daemon and the xLSTM workload daemon into the same cooperative path so the platform can serve language-model traffic, structured-ML workloads, and embeddings from one runtime — without an external sidecar.

What changes for you:

Startup time drops. One process to start, not two. New installs come up faster; restarts (after upgrades, after kernel patches) come back faster.
Memory footprint drops. One model loader instead of two competing for VRAM. Smaller installs (Raspberry Pi, NUC, Jetson) get more headroom for the model they actually care about.
The "Load" button finally works. Loading a GGUF model from the admin console becomes a single click in the Inferenced dashboard — no separate Ollama pull step, no separate model registry to keep in sync.
Embeddings stay local. The same GGUF that handled chat completion can do embeddings via the OpenAI-compatible /v1/embeddings path. No separate embedding service to deploy.

The trade-off: Ollama-format models still load (we don't break what works). But the recommended path moves to native GGUF for new installations — smaller surface area, fewer things to go wrong.

2 · Peer-aware routing

The cluster routes around dead nodes by itself.

5.0 ships the routing layer with explicit configuration — the controller knows the peers because someone configured them. If a peer goes down mid-conversation, the controller catches the timeout and retries; but new requests still get sent the same way until you change the config. 5.1's "Path A peer-aware routing" makes the routing decision live: the controller continuously discovers the cluster's healthy nodes, ranks them by load and latency, and steers new requests to whichever node can actually serve them right now.

What changes for you:

One node dying doesn't degrade the cluster. Maintenance windows, kernel patches, a single bad GPU — the platform routes around them without an admin in the loop.
No more "did anyone update the peer list" failures. Adding a worker becomes a one-step operation (start the daemon; it advertises itself). The controller picks it up; routing decisions include it on the next request.
Load actually balances. The router can see which peer is handling more traffic and bias new requests toward the quieter one. With explicit configuration today the load tends to land on whichever node was named first.
Embedding requests find an embedder. If the peer with the embedding model goes down, the next available peer with the same capability takes over — instead of every RAG query failing until the operator notices.

The trade-off: a small new background task (peer health pings) runs on every controller. Negligible on real clusters; explicitly disable-able on single-node installs where it's noise.

3 · Re-embed-all

Upgrading the embedding model becomes a one-button operation.

The embedding model is the spine of RAG. If you change it — a better model becomes available, the chunking strategy gets revised, the vector store schema evolves — every chunk in every knowledge base needs to be re-embedded. Today that's a careful operator job: schedule a maintenance window, run an admin script per namespace, monitor progress, hope nothing surfaces a corner case.

5.1 turns the same operation into a single button:

Click "re-embed all" in the admin console. The platform schedules per-namespace re-embedding behind a rate limiter so the live RAG path keeps working during the rebuild.
Progress is observable. A live dashboard shows chunks-done / chunks-total per namespace, ETA, and a per-document drill-down for failed chunks.
Rollback is one click. If the new model produces measurably worse retrieval quality on your evaluation set, you switch back. The old embeddings stay around until the new ones complete a soak period.
The stale-chunks detector auto-fires. When the embedding model upgrades, the platform flags every chunk that's still on the old version and offers to re-embed it. No manual reconciliation.

The trade-off: re-embedding a large corpus takes time and burns embedding-model GPU cycles. The platform schedules the work to minimise user-facing latency, but a large institution with a million documents will still see a few hours of background work.

4 · Content-aware chunking, deeper

The chunker learns from what works in your tenant.

5.0 already ships content-aware chunking — the platform picks a strategy per content type based on a built-in table (scientific PDF → semantic, code → function-boundary, CSV → per-row, and so on). See chunking strategies for the full default set.

5.1 takes the same surface and adds two things:

Per-tenant defaults that learn from acceptance. When customers thumbs-up answers grounded in a particular chunking strategy, the platform notices. The retention loop's acceptance signal becomes a quiet vote on which chunk-size + overlap combination is producing the most-cited chunks. Over weeks, the per-tenant defaults shift toward what's actually working for your documents.
Per-document override at ingest. Customers who already know their PDFs are organised differently from the typical scientific-paper structure — for instance, contracts where each clause is its own semantic unit — can set a per-document override in the intelligent upload dialog. The platform respects the override and learns from it.

The trade-off: the learning loop needs a few weeks of acceptance signal before the per-tenant defaults stabilise. Brand-new installations keep the built-in table defaults until the loop has enough signal.

The honest framing

No dates. Here's why.

None of the four pieces above has a release date attached. That's not because we don't have an internal plan — we do. It's because software dates slip; published dates create pressure to ship before the surface is right. Eldric 5.0 took the time it took. 5.1 will take the time it takes.

What you can rely on:

The four pieces are in active development. They're scoped, designed, and the implementation work is in flight.
Public commits land on the open repository. If you want to follow along, the GitHub repository is the source of truth.
5.0 customers get a clean upgrade path. Every 5.1 piece is additive — no breaking change to the 5.0 surface, no migration burden.
If something on this list ships and another piece slips, we say so. The release notes are the formal cut; known issues tracks what's actually rough at any given moment.

Feedback loop

Tell us what's missing.

The 5.1 roadmap moves on customer signal. If something you need isn't here, write to office@eldric.ai. Paid-tier customers with a license ID: support@eldric.ai for prioritised handling — feature requests from active customers carry weight in scheduling.

For the briefer 5.1 cards view: what's next. For everything shipped right now: release notes. For 5.0-line bugs and preview-status caveats: known issues.

What landsin 5.1.