Eldric Router

Intelligent load balancing with AI-powered routing decisions

v4.2.0

Architecture Overview

The Router sits between Edge/Controller and Workers, distributing inference requests across the cluster using configurable strategies and optional AI-powered decision making.

Router Architecture & Request Flow

Overview

The Eldric Router operates on Port 8881 and serves as the intelligent traffic distribution layer between the Edge Server or Controller and backend Workers. It supports five built-in load balancing strategies and optional AI-powered routing for context-aware worker selection.

Intelligent Distribution

Five load balancing strategies from simple round-robin to AI-powered autonomous routing with LLM-based decision making.

Worker Health Monitoring

Continuous health checks with automatic failover. Unhealthy workers are removed from rotation and re-added when recovered.

Zero-Copy Streaming

Server-Sent Events (SSE) proxy with zero-copy forwarding. OpenAI-compatible streaming from any backend through the router.

Controller Sync

Syncs worker registry from the Controller at configurable intervals. Can also operate standalone with manually configured workers.

Load Balancing Strategies

The router provides five built-in strategies for distributing requests across workers. The default strategy is load_based.

round_robin

Simple rotation through available workers. Each worker receives requests in sequence. Best for homogeneous worker pools with similar capabilities.

Default Fallback

least_connections

Routes to the worker with the fewest active requests. Naturally adapts to workers with different processing speeds. Good for mixed hardware environments.

Recommended

load_based

Routes based on reported worker load metrics (CPU, memory, GPU utilization). Workers report their load during health checks. The default strategy.

Default

latency_based

Tracks response times per worker and routes to the fastest. Adapts in real-time as latency changes. Ideal for geographically distributed clusters.

Performance

random

Random worker selection from the healthy pool. Provides natural distribution without tracking state. Useful for testing and simple deployments.

Basic

Swarm LLM & Ensemble Mode

Fan-out inference across multiple LLM workers with intelligent response synthesis. The router automatically selects the optimal strategy based on query content.

debate

Models argue different positions across multiple rounds. A judge model evaluates arguments and renders the final verdict. Best for decisions and architecture questions.

Multi-Round

critique

Model A generates a response, Model B critiques it, Model A revises. Iterates for configurable rounds. Best for writing, planning, and content refinement.

Iterative

best_of_n

Fan-out to N models in parallel, a judge picks the single best answer. Best for code generation where merging multiple outputs produces inconsistencies.

Parallel

vote

All models answer independently, consensus analysis identifies agreement and disagreement with confidence indicators. Best for factual and classification questions.

Consensus

synthesize

Default strategy. Merge insights from all model responses into one comprehensive answer. A synthesis model combines the best elements from each response.

Default

xLSTM Predictor

Optional xLSTM (Sepp Hochreiter) integration for workload forecasting, anomaly detection, and fast sequence classification. Enables proactive scaling decisions before load spikes hit.

Load Balancing Strategy Comparison

# Configure load balancing strategy via Controller API
curl -X POST http://controller:8880/api/v1/router/config \
  -H "Content-Type: application/json" \
  -d '{"strategy": "least_connections"}'

# Or configure directly on the router
curl -X POST http://router:8881/api/v1/config \
  -H "Content-Type: application/json" \
  -d '{"strategy": "latency_based"}'
            

AI-Based Routing

Eldric supports intelligent AI-controlled routing where an LLM makes real-time worker selection decisions based on request context, worker capabilities, and cluster state.

AI Control Modes

none

AI routing disabled. Uses algorithmic load balancing only. Lowest latency overhead.

Default

advisory

AI suggests a worker but the system only logs the suggestion. Useful for testing AI routing before enabling it.

Evaluation

autonomous

AI makes the routing decision. The LLM evaluates worker load, latency, and capabilities to select the optimal target.

Enterprise

Router LLM Model

AI routing uses a dedicated Ollama model for decision making. Eldric includes a custom-trained routing model optimized for fast, accurate worker selection. Any Ollama-compatible model can be used, but smaller models (1B-3B parameters) are recommended to minimize routing latency.

Decision Flow

AI Routing Decision Pipeline

Enable AI Routing

# Enable autonomous AI routing
curl -X POST http://router:8881/api/v1/ai/configure \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "autonomous",
    "llm_model": "llama3.2:3b",
    "ollama_url": "http://localhost:11434"
  }'

# Check AI routing status
curl http://router:8881/api/v1/ai/status
            

Response with Routing Info

When AI routing is active, responses include routing metadata explaining the decision.

{
  "worker": "wrk-abc123",
  "worker_host": "10.3.7.20",
  "routing": {
    "strategy": "ai_routing",
    "reason": "Low latency and capability match for code generation tasks"
  }
}
            

Knowledge Routing

The router supports content-aware knowledge routing with 150+ predefined themes. Incoming requests are analyzed and routed to workers specialized in the relevant domain -- from scientific computing to creative writing to code generation.

How Knowledge Routing Works

The router classifies incoming prompts by topic and matches them to workers configured with domain expertise. A request about molecular biology routes to a worker running a science-tuned model, while a coding question routes to a worker with a code-optimized model.

150+ predefined knowledge themes across science, engineering, humanities, and more
Automatic prompt classification using keyword and semantic analysis
Worker capability tags for domain specialization
Fallback to standard load balancing when no specialization match is found

For full details, see the Knowledge Routing documentation.

Worker Health Monitoring

The router continuously monitors worker health and automatically manages the active worker pool.

Periodic Health Checks

The router polls each worker's /health endpoint at a configurable interval (default: 30 seconds). Workers report their status, load metrics, available models, and GPU utilization.

Automatic Failover

When a worker fails health checks, it is removed from the active rotation. Requests are automatically redistributed to remaining healthy workers with no client impact.

Recovery Detection

Unhealthy workers continue to be checked. When they recover, they are automatically re-added to the active pool. No manual intervention required.

Standalone Router Daemon

The router can run as a standalone daemon (eldric-routerd) that syncs workers from the Controller and operates independently.

Router Mode Features

Syncs worker registry from the Controller at configurable intervals
Operates independently even if the Controller goes offline temporarily
Supports multiple router instances for high availability
Maintains its own health check cycle for all known workers
Configurable sync interval (default: 30 seconds)

# Run standalone router daemon
./eldric-routerd --port 8881 \
  --controller http://controller:8880 \
  --sync-interval 30000

# Router with AI routing enabled
./eldric-routerd --port 8881 \
  --controller http://controller:8880 \
  --ai-mode autonomous \
  --ai-model llama3.2:3b \
  --ollama-url http://localhost:11434

# Router with specific strategy
./eldric-routerd --port 8881 \
  --controller http://controller:8880 \
  --strategy latency_based
            

Streaming Support

The router provides zero-copy SSE (Server-Sent Events) proxy, forwarding streaming responses from workers to clients in real-time with OpenAI-compatible format.

Zero-Copy Streaming Pipeline

# Streaming chat through the router (OpenAI-compatible)
curl -X POST http://router:8881/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

# SSE response format
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: {"choices":[{"delta":{"content":" How"}}]}
data: [DONE]
            

API Endpoints

Endpoint	Method	Description
`/health`	GET	Basic health check
`/api/v1/health`	GET	Detailed health status with worker counts and uptime
`/v1/chat/completions`	POST	OpenAI-compatible chat completions (proxied to workers). Supports streaming via SSE.
`/v1/models`	GET	List available models across all workers (aggregated, deduplicated)
`/api/v1/ai/configure`	POST	Configure AI routing mode, LLM model, and Ollama URL
`/api/v1/ai/status`	GET	Get current AI routing configuration and statistics
`/api/v1/workers`	GET	List workers known to this router (synced from Controller)
`/api/v1/data/query`	POST	Proxied to Data Worker for database queries

Configuration

The router can be configured via a JSON configuration file or command-line arguments.

// router.json
{
  "port": 8881,
  "bind_address": "0.0.0.0",
  "controller_url": "http://controller:8880",
  "sync_interval_ms": 30000,
  "health_check_interval_ms": 30000,
  "strategy": "load_based",
  "ai_routing": {
    "mode": "none",
    "llm_model": "llama3.2:3b",
    "ollama_url": "http://localhost:11434"
  },
  "workers": [
    {
      "url": "http://10.3.7.47:8890",
      "tags": ["gpu", "inference"]
    },
    {
      "url": "http://10.19.0.12:8890",
      "tags": ["gpu", "inference"]
    }
  ]
}
            

CLI Usage

# Start router with controller sync
./eldric-routerd --port 8881 --controller http://controller:8880

# Start with custom sync interval (60 seconds)
./eldric-routerd --port 8881 --controller http://controller:8880 --sync-interval 60000

# Start with specific load balancing strategy
./eldric-routerd --port 8881 --controller http://controller:8880 --strategy least_connections

# Start with AI routing enabled
./eldric-routerd --port 8881 \
  --controller http://controller:8880 \
  --ai-mode autonomous \
  --ai-model llama3.2:3b \
  --ollama-url http://localhost:11434

# Start with manually specified workers (no controller)
./eldric-routerd --port 8881 \
  --workers http://10.3.7.47:8890,http://10.19.0.12:8890,http://10.19.0.13:8890

# Start from config file
./eldric-routerd --config /etc/eldric/router.json

# Build the router daemon
cd cpp/build
cmake -DBUILD_DISTRIBUTED=ON ..
make eldric-routerd
            

Quick Start

Get a router running in under a minute.

Step 1: Build

cd cpp/build
cmake -DBUILD_DISTRIBUTED=ON ..
make eldric-routerd
                

Step 2: Start the Router

# Connect to an existing controller
./eldric-routerd --port 8881 --controller http://localhost:8880
                

Step 3: Send a Request

curl -X POST http://localhost:8881/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
                

Port Reference

Component	Port	Protocol	Description
Router	8881	HTTP/REST	Load balancing and request routing
Edge Server	443	HTTPS	TLS termination and authentication
Controller	8880	HTTP/REST	Cluster management and worker registry
Worker	8890	HTTP/REST	AI inference via backend (Ollama, vLLM, etc.)
Cloud Worker	8889	HTTP/REST	Multi-backend cloud inference gateway
Data Worker	8892	HTTP/REST	Database queries proxied via router