Intelligent load balancing with AI-powered routing decisions
v4.1.0The Router sits between Edge/Controller and Workers, distributing inference requests across the cluster using configurable strategies and optional AI-powered decision making.
The Eldric Router operates on Port 8881 and serves as the intelligent traffic distribution layer between the Edge Server or Controller and backend Workers. It supports five built-in load balancing strategies and optional AI-powered routing for context-aware worker selection.
Five load balancing strategies from simple round-robin to AI-powered autonomous routing with LLM-based decision making.
Continuous health checks with automatic failover. Unhealthy workers are removed from rotation and re-added when recovered.
Server-Sent Events (SSE) proxy with zero-copy forwarding. OpenAI-compatible streaming from any backend through the router.
Syncs worker registry from the Controller at configurable intervals. Can also operate standalone with manually configured workers.
The router provides five built-in strategies for distributing requests across workers. The default strategy is load_based.
Simple rotation through available workers. Each worker receives requests in sequence. Best for homogeneous worker pools with similar capabilities.
Default FallbackRoutes to the worker with the fewest active requests. Naturally adapts to workers with different processing speeds. Good for mixed hardware environments.
RecommendedRoutes based on reported worker load metrics (CPU, memory, GPU utilization). Workers report their load during health checks. The default strategy.
DefaultTracks response times per worker and routes to the fastest. Adapts in real-time as latency changes. Ideal for geographically distributed clusters.
PerformanceRandom worker selection from the healthy pool. Provides natural distribution without tracking state. Useful for testing and simple deployments.
BasicFan-out inference across multiple LLM workers with intelligent response synthesis. The router automatically selects the optimal strategy based on query content.
Models argue different positions across multiple rounds. A judge model evaluates arguments and renders the final verdict. Best for decisions and architecture questions.
Multi-RoundModel A generates a response, Model B critiques it, Model A revises. Iterates for configurable rounds. Best for writing, planning, and content refinement.
IterativeFan-out to N models in parallel, a judge picks the single best answer. Best for code generation where merging multiple outputs produces inconsistencies.
ParallelAll models answer independently, consensus analysis identifies agreement and disagreement with confidence indicators. Best for factual and classification questions.
ConsensusDefault strategy. Merge insights from all model responses into one comprehensive answer. A synthesis model combines the best elements from each response.
DefaultOptional xLSTM (Sepp Hochreiter) integration for workload forecasting, anomaly detection, and fast sequence classification. Enables proactive scaling decisions before load spikes hit.
Eldric supports intelligent AI-controlled routing where an LLM makes real-time worker selection decisions based on request context, worker capabilities, and cluster state.
AI routing disabled. Uses algorithmic load balancing only. Lowest latency overhead.
DefaultAI suggests a worker but the system only logs the suggestion. Useful for testing AI routing before enabling it.
EvaluationAI makes the routing decision. The LLM evaluates worker load, latency, and capabilities to select the optimal target.
EnterpriseAI routing uses a dedicated Ollama model for decision making. Eldric includes a custom-trained routing model optimized for fast, accurate worker selection. Any Ollama-compatible model can be used, but smaller models (1B-3B parameters) are recommended to minimize routing latency.
When AI routing is active, responses include routing metadata explaining the decision.
The router supports content-aware knowledge routing with 150+ predefined themes. Incoming requests are analyzed and routed to workers specialized in the relevant domain -- from scientific computing to creative writing to code generation.
The router classifies incoming prompts by topic and matches them to workers configured with domain expertise. A request about molecular biology routes to a worker running a science-tuned model, while a coding question routes to a worker with a code-optimized model.
For full details, see the Knowledge Routing documentation.
The router continuously monitors worker health and automatically manages the active worker pool.
The router polls each worker's /health endpoint at a configurable interval (default: 30 seconds). Workers report their status, load metrics, available models, and GPU utilization.
When a worker fails health checks, it is removed from the active rotation. Requests are automatically redistributed to remaining healthy workers with no client impact.
Unhealthy workers continue to be checked. When they recover, they are automatically re-added to the active pool. No manual intervention required.
The router can run as a standalone daemon (eldric-routerd) that syncs workers from the Controller and operates independently.
The router provides zero-copy SSE (Server-Sent Events) proxy, forwarding streaming responses from workers to clients in real-time with OpenAI-compatible format.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Basic health check |
/api/v1/health |
GET | Detailed health status with worker counts and uptime |
/v1/chat/completions |
POST | OpenAI-compatible chat completions (proxied to workers). Supports streaming via SSE. |
/v1/models |
GET | List available models across all workers (aggregated, deduplicated) |
/api/v1/ai/configure |
POST | Configure AI routing mode, LLM model, and Ollama URL |
/api/v1/ai/status |
GET | Get current AI routing configuration and statistics |
/api/v1/workers |
GET | List workers known to this router (synced from Controller) |
/api/v1/data/query |
POST | Proxied to Data Worker for database queries |
The router can be configured via a JSON configuration file or command-line arguments.
Get a router running in under a minute.
| Component | Port | Protocol | Description |
|---|---|---|---|
| Router | 8881 | HTTP/REST | Load balancing and request routing |
| Edge Server | 443 | HTTPS | TLS termination and authentication |
| Controller | 8880 | HTTP/REST | Cluster management and worker registry |
| Worker | 8890 | HTTP/REST | AI inference via backend (Ollama, vLLM, etc.) |
| Cloud Worker | 8889 | HTTP/REST | Multi-backend cloud inference gateway |
| Data Worker | 8892 | HTTP/REST | Database queries proxied via router |