How it works

A short tour
of the architecture.

Eldric is a small set of co-operating processes. Each one does one job. They can all run on one machine for a developer trial, or spread across many machines for a multi-tenant production cluster.


The picture

Three hops to the worker.

Architecture · request flow
Client Web · CLI · GUI · iOS · Mac Edge TLS · auth · rate-limit port 443 / 80 Controller topology · license · audit sidecar query Router intent · load · theme picks the right worker Data users · files · vectors · EMM memory system-tier — used by every worker DB2 SQL PostgreSQL · MySQL NFS Inference Ollama · vLLM · llama.cpp Inferenced native GGUF · smart memory recall Cloud OpenAI · Anthropic · xAI · Groq · Ollama Cloud xLSTM policy · forecast · encode · retrieve Agent agentic RAG · multi-agent · workflows Media STT · TTS · video · voice chat Comm email · SMS · WhatsApp · Signal · Teams · VoIP IoT OPC-UA · Modbus · MQTT · historian Science 16-category source registry · plugins Training LoRA · DPO · federated · distillation

A request from a client lands at the Edge (TLS + auth + rate-limit) — the Edge looks up the user against the Data Worker. The Edge hands it to a Router, which asks the Controller for the current topology, classifies the request (chat? RAG? voice? sensor read?), and picks the right worker. The worker does the work — persisting state in the Data Worker as it goes — and streams the answer back along the same path. The Data Worker is the system-tier backbone: every other service consults it for users, sessions, files, vectors, and EMM memory, and it bridges out to your existing DB2, PostgreSQL, MySQL, or NFS storage so the platform can sit on top of data you already have.


The processes, one paragraph each.

Edge

The only process exposed to the public network. Terminates TLS, validates the API key, enforces rate limits, and forwards to a Router. Has no model state of its own. Also serves the built-in chat shell at /chat.

Controller

Keeps the cluster topology in one place. Workers register here and heartbeat every thirty seconds. The controller owns the license file, the audit ledger, the backup orchestration, the rolling-upgrade coordinator, and the PKI for internal certificates.

Router

Decides which worker handles which request. Picks based on intent (a chat? a RAG search? a voice call?), load (which worker is busiest?), and theme (a medical question routes to a medically-tuned model). Has eight load-balancing strategies and an optional LLM-based decision mode.

Data Worker

The system-tier data backbone. Stores user accounts, sessions, audit trail, files, vectors, and EMM associative memory. Speaks DB2, PostgreSQL, MySQL, ODBC, and NFS, so it can sit in front of databases you already operate — the platform doesn't ask you to migrate. Multi-tenant isolation, quotas, replication between data workers, and an optional NFS-Ganesha server for filesystem clients are built in. Every other worker leans on this.

Workers

One per function. Inference workers run local language models (Ollama, vLLM, llama.cpp). The Cloud Worker fronts external vendor APIs (OpenAI, Anthropic, xAI, Groq, Ollama Cloud) so the rest of the cluster doesn't care which vendor a model lives behind. Agent workers run the iterative reasoning loops. Media workers do speech-to-text, text-to-speech, and video. Comm workers carry email, SMS, WhatsApp, Signal, Teams, and VoIP. Training workers fine-tune models. All of them persist their state through the Data Worker.

Inferenced

A native inference worker that loads GGUF and xLSTM models directly through embedded llama.cpp. No Ollama dependency, and the only path that supports smart memory inference — the model consults your associative memory at the prompt boundary. Use it for the smallest deployments and for air-gapped sites.

xLSTM daemon

Structured-machine-learning workloads that aren't general chat: closed-loop policy execution for control, time-series forecasting on telemetry, vision-language encoding for perception, and associative retrieval with microsecond latency on CPU. One process, four workload classes, license-gated per class.

IoT worker

Speaks the protocols that smart homes and factories actually use: OPC-UA for PLCs and SCADA, Modbus TCP/RTU for legacy equipment, and MQTT for everything in between. Time-series historian, alarm management, OEE calculation, and a store-and-forward buffer for sites with flaky uplinks.

Science worker

Federates scientific data across sixteen categories — particle physics, genomics, neuroscience, climate, archaeology, and more — through a single source registry. Admins enable the sources their tenants need; the LLM-callable tools stay the same regardless of which sources are wired in. Customer plugins land under a custom category without touching the worker code.

Three things worth knowing.

The same software runs on a Pi.

The 5.0 kernel is the same on a Raspberry Pi 4, a developer workstation, a rack-mounted server, and across a multi-node cluster. What changes is which modules you activate per node. A small box does not get a stripped-down product; it gets the same product with fewer modules switched on.

The data path is short.

Edge → Router → Worker. Three hops. Streaming responses pass through with no buffering. Knowledge-base search hits the EMM (compressed, associative memory) first and only falls back to the vector store when exact source citations are needed — for pure chat use cases the vector store can be dropped entirely. There is no hidden middleware that resells your data.

Honest scope: where it is fast, where it is not.

On our reference cluster, chat sustains 793 requests per second at 32 concurrent connections, with median latency of 41 milliseconds. That is good. Knowledge-base search at four concurrent connections still hits a ~7-second p50 latency cliff. That is not good, and we are fixing it. The numbers come from our 2026-05 baseline; we publish them so you know what to expect.


Hardware

What it actually runs on.

Our reference cluster is intentionally modest. The numbers above come from this hardware.

1
Inference-tier GPU (RTX 4070 Ti, 12 GB) for LLMs
1
Router-tier GPU (RTX 2080, 8 GB) for routing + small models
5
Worker nodes total, including controller, edge, data
Pi 4
The smallest target — 8 GB RAM is enough for kernel + light models