Workload daemons

xLSTM,
scaled.

The xlstmd daemon hosts production xLSTM workloads — policy execution, time-series forecasting, vision encoding, associative retrieval — at predictable latency and memory cost.

Why xLSTM

The transformer alternative for sequence-heavy work.

Transformers dominate text generation but their O(n²) attention cost becomes a bottleneck for long sequences, real-time control loops, and any workload where memory budget matters more than parameter count. The xLSTM family (matrix-LSTM + scalar-LSTM blocks) restores linear-time recurrence with modern training tricks — a fit for control policies, multi-horizon forecasts, and dense retrieval at scale.

The xlstmd daemon (port 8884) hosts these workloads behind a stable HTTP surface so customers can call them from the chat shell, agentic flows, or external integrations — without managing the Python sidecar lifecycle directly.

Workloads

Four production surfaces.

Policy execution

Run a trained xLSTM policy on streaming observations and emit actions. For robotics, industrial control loops, autonomous decision-making.

Endpoint: POST /api/v1/xlstm/policy/step

Time-series forecasting

Multi-horizon forecasts on telemetry streams — sensor data, financial series, OEE metrics. Returns mean + quantile bands per horizon.

Endpoint: POST /api/v1/xlstm/forecast

Vision encoding

Frame-by-frame embedding of image / video streams. Pairs with the media worker (port 8894) for pipeline-encoded video and the data worker (port 8892) for embedded-frame retrieval.

Endpoint: POST /api/v1/xlstm/encode

Associative retrieval

High-recall key-value lookup against a learned associative memory. Complements the standard vector store with O(1) recall on exact keys.

Endpoint: POST /api/v1/xlstm/retrieve

Operating model

How it runs.

Daemon: eldric-xlstmd on port 8884. Registers with the controller at startup; routed to by the request router (port 8881) when intent classifies as xLSTM-shaped (forecast / control / encode / retrieve).
Backend: Python sidecar process executes the actual model weights. The daemon supervises the sidecar lifecycle, restarts on crash, and surfaces health to the controller dashboard.
Model registry: models live on the data worker (port 8892) under the xlstm-models namespace. Pull or replace via the standard model-management API.
Hardware: CUDA 11.8 GPU recommended for inference; CPU fallback supported via a parallel torch wheel for client/laptop deployments. Multi-GPU tensor-parallel scaling on RTX-class hardware.
License: Free tier exposes the four endpoints with rate limits; Standard+ unlocks higher per-tenant request quotas. Specific feature names finalized at GA.

When to use which workload

Decision shape.

Predict the next N steps of a sensor series? Forecasting.
Pick the next action given the current observation? Policy execution.
Compress images / video frames into embeddings for downstream search? Vision encoding.
Look up the exact stored response for a key seen before? Associative retrieval (use the standard vector store if you need semantic search instead).

Status

Where it is on the roadmap.

The four endpoints are live in 5.0. The Python sidecar ships in the standard eldric-aios RPM (and the CUDA-enabled variant) — no separate install. License-tier limits and Pro-only advanced features (custom architecture training, multi-tenant policy isolation) finalize during GA.

Coming in 5.0.x: voice over IP integration (real-time bidirectional streams), inline CAD viewer for STEP / IGES files, additional model architectures in the policy / encoder slots.

xLSTM,scaled.