How Eldric works — Architecture of a brain-inspired AI server

The processes, one paragraph each.

Edge

The only process exposed to the public network. Terminates TLS, validates the API key, enforces rate limits, and forwards to a Router. Has no model state of its own. Also serves the built-in chat shell at /chat.

Controller

Keeps the cluster topology in one place. Workers register here and heartbeat every thirty seconds. The controller owns the license file, the audit ledger, the backup orchestration, the rolling-upgrade coordinator, and the PKI for internal certificates.

Router

Decides which worker handles which request. Picks based on intent (a chat? a RAG search? a voice call?), load (which worker is busiest?), and theme (a medical question routes to a medically-tuned model). Has eight load-balancing strategies and an optional LLM-based decision mode.

Data Worker

The system-tier data backbone. Stores user accounts, sessions, audit trail, files, vectors, and EMM associative memory. Speaks DB2, PostgreSQL, MySQL, ODBC, and NFS, so it can sit in front of databases you already operate — the platform doesn't ask you to migrate. Multi-tenant isolation, quotas, replication between data workers, and an optional NFS-Ganesha server for filesystem clients are built in. Every other worker leans on this.

Workers

One per function. Inference workers run local language models (Ollama, vLLM, llama.cpp). The Cloud Worker fronts external vendor APIs (OpenAI, Anthropic, xAI, Groq, Ollama Cloud) so the rest of the cluster doesn't care which vendor a model lives behind. Agent workers run the iterative reasoning loops. Media workers do speech-to-text, text-to-speech, and video. Comm workers carry email, SMS, WhatsApp, Signal, Teams, and VoIP. Training workers fine-tune models. All of them persist their state through the Data Worker.

Inferenced

A native inference worker that loads GGUF and xLSTM models directly through embedded llama.cpp. No Ollama dependency, and the only path that supports smart memory inference — the model consults your associative memory at the prompt boundary. Use it for the smallest deployments and for air-gapped sites.

xLSTM daemon

Structured-machine-learning workloads that aren't general chat: closed-loop policy execution for control, time-series forecasting on telemetry, vision-language encoding for perception, and associative retrieval with microsecond latency on CPU. One process, four workload classes, license-gated per class.

IoT worker

Speaks the protocols that smart homes and factories actually use: OPC-UA for PLCs and SCADA, Modbus TCP/RTU for legacy equipment, and MQTT for everything in between. Time-series historian, alarm management, OEE calculation, and a store-and-forward buffer for sites with flaky uplinks.

Science worker

Federates scientific data across sixteen categories — particle physics, genomics, neuroscience, climate, archaeology, and more — through a single source registry. Admins enable the sources their tenants need; the LLM-callable tools stay the same regardless of which sources are wired in. Customer plugins land under a custom category without touching the worker code.

Three things worth knowing.

The same software runs on a Pi.

The 5.0 kernel is the same on a Raspberry Pi 4, a developer workstation, a rack-mounted server, and across a multi-node cluster. What changes is which modules you activate per node. A small box does not get a stripped-down product; it gets the same product with fewer modules switched on.

The data path is short.

Edge → Router → Worker. Three hops. Streaming responses pass through with no buffering. Knowledge-base search hits the EMM (compressed, associative memory) first and only falls back to the vector store when exact source citations are needed — for pure chat use cases the vector store can be dropped entirely. There is no hidden middleware that resells your data.

Honest scope: where it is fast, where it is not.

On our reference cluster, chat sustains 793 requests per second at 32 concurrent connections, with median latency of 41 milliseconds. That is good. Knowledge-base search at four concurrent connections still hits a ~7-second p50 latency cliff. That is not good, and we are fixing it. The numbers come from our 2026-05 baseline; we publish them so you know what to expect.

A short tour
of the architecture.

Three hops to the worker.

The processes, one paragraph each.

Edge

Controller

Router

Data Worker

Workers

Inferenced

xLSTM daemon

IoT worker

Science worker

Three things worth knowing.

The same software runs on a Pi.

The data path is short.

Honest scope: where it is fast, where it is not.

What it actually runs on.

A short tourof the architecture.

Three hops to the worker.

The processes, one paragraph each.

Edge

Controller

Router

Data Worker

Workers

Inferenced

xLSTM daemon

IoT worker

Science worker

Three things worth knowing.

The same software runs on a Pi.

The data path is short.

Honest scope: where it is fast, where it is not.

What it actually runs on.

A short tour
of the architecture.