Eldric Cloud Worker

Multi-backend cloud inference gateway. Route AI requests across xAI, OpenAI, Anthropic, DeepSeek, Groq, and more — from a single unified OpenAI-compatible endpoint.

Port 8889 · eldric-cloudd

Overview

The Cloud Worker (eldric-cloudd) is a multi-backend cloud inference gateway running on Port 8889. It manages multiple cloud API backends simultaneously, routing requests to the correct provider based on the requested model name. Unlike the standard inference worker (which proxies a single local backend like Ollama), the Cloud Worker aggregates dozens of cloud APIs behind one unified endpoint.

Multi-Backend Aggregation

Connect to xAI, OpenAI, Anthropic, DeepSeek, Groq, Together AI, Fireworks, Mistral, Cohere, and any OpenAI-compatible endpoint — all through a single gateway.

Model-Based Routing

Requests are automatically routed to the correct backend based on the model name. Ask for grok-3 and traffic goes to xAI; ask for claude-3.5-sonnet and it goes to Anthropic.

OpenAI-Compatible API

Exposes a standard /v1/chat/completions endpoint with full streaming SSE support. Drop-in replacement for any OpenAI client library.

Health Monitoring

Continuous 60-second health checks on all connected backends. Unhealthy backends are automatically excluded from routing until they recover.

Priority & Fallback

Priority-based routing with automatic fallback to a default backend when the requested model is not found on any provider.

Horizontal Scaling

Deploy multiple Cloud Worker instances. Each registers independently with the controller using a cld- ID prefix, enabling load distribution across gateways.

Architecture

The Cloud Worker sits within the Eldric distributed stack alongside regular inference workers. The controller and router treat it as a specialized worker with worker_type: "cloud".

Cloud Worker in the Eldric Stack
Controller :8880 Router :8881 Worker :8890 (Local) Cloud Worker :8889 Ollama / vLLM xAI / Grok OpenAI Anthropic DeepSeek Groq Together AI Fireworks
Request

/v1/chat/completions
model: "grok-3"

Cloud Worker

Model lookup & backend resolution

xAI API

Forwarded with SSE streaming

Response

Streamed back to caller

Supported Cloud Backends

The Cloud Worker supports all major cloud LLM providers and any endpoint following the OpenAI API specification.

Backend API Endpoint Example Models Streaming Tool Calling
Ollama Cloud http://remote:11434 llama3.2, qwen3, mistral, gemma3 Yes Yes
xAI / Grok https://api.x.ai/v1 grok-3, grok-3-mini, grok-2 Yes Yes
OpenAI https://api.openai.com/v1 gpt-4o, gpt-4-turbo, gpt-4, o1, o3 Yes Yes
Anthropic https://api.anthropic.com/v1 claude-sonnet-4-20250514, claude-3.5-sonnet, claude-3-opus Yes Yes
DeepSeek https://api.deepseek.com/v1 deepseek-chat, deepseek-reasoner Yes Yes
Groq https://api.groq.com/openai/v1 llama-3.3-70b, mixtral-8x7b, gemma2-9b Yes Yes
Together AI https://api.together.xyz/v1 meta-llama/Llama-3.3-70B, Qwen/Qwen2.5-72B Yes Yes
Fireworks AI https://api.fireworks.ai/inference/v1 llama-v3p3-70b, mixtral-8x22b Yes Yes
Mistral AI https://api.mistral.ai/v1 mistral-large, mistral-medium, codestral Yes Yes
Cohere https://api.cohere.ai/v1 command-r-plus, command-r, embed-english Yes Yes
OpenAI-Compatible Any URL Any model following OpenAI spec Yes Varies
xAI / Grok

Frontier reasoning with Grok-3

Cloud
OpenAI

GPT-4o, o1, o3 reasoning

Cloud
Anthropic

Claude Sonnet, Opus

Cloud
DeepSeek

Cost-effective reasoning

Cloud
Groq

Ultra-fast inference

Cloud
Together AI

Open model hosting

Cloud
Fireworks AI

Optimized open models

Cloud
Mistral AI

European AI provider

Cloud
Cohere

Enterprise NLP & RAG

Cloud
Ollama Cloud

Remote Ollama instances

Remote Local
Custom Endpoint

Any OpenAI-compatible API

Compatible

Auto-Discovery

The Cloud Worker automatically discovers available models from all connected backends at startup and refreshes periodically. When a new backend is added, its models are queried and merged into the unified model catalog.

Model Catalog

All models from all backends are aggregated into a single catalog, served via GET /v1/models. Each model entry includes its source backend, so clients can see every available model in one call.

  • Queries each backend's model list endpoint
  • Merges into unified model namespace
  • Tags models with backend source
  • Refreshes on configurable interval

Backend Registration

Backends can be configured statically via CLI flags or configuration file, and added dynamically via the API at runtime without restarting the worker.

  • Static configuration via --backends flag
  • Dynamic registration via REST API
  • Hot-reload without worker restart
  • Per-backend API key management
# List all models across all backends curl http://localhost:8889/v1/models # Response includes models from every connected backend { "data": [ {"id": "grok-3", "owned_by": "xai"}, {"id": "gpt-4o", "owned_by": "openai"}, {"id": "claude-3.5-sonnet", "owned_by": "anthropic"}, {"id": "deepseek-chat", "owned_by": "deepseek"}, {"id": "llama-3.3-70b-versatile", "owned_by": "groq"}, ... ] }

Priority-Based Routing

When a chat completion request arrives, the Cloud Worker resolves the model name to the correct backend. If multiple backends serve the same model, priority determines which backend handles the request. If the model is not found on any backend, the request falls back to the configured default backend.

Model Resolution

Each request specifies a model name. The Cloud Worker maintains a model-to-backend mapping built from auto-discovery. The first backend that owns the model receives the request.

Default Backend Fallback

A default backend can be configured for requests where the model is not found. This ensures requests always have somewhere to go, even with unknown model names.

Health-Aware Selection

Only healthy backends are considered for routing. If a backend fails health checks, its models are temporarily excluded until the backend recovers.

Routing Flow

1 Request arrives
2 Extract model name
3 Lookup backend
4 Check health
5 Forward request

If step 3 fails (model not found), the request is forwarded to the default backend. If step 4 fails (backend unhealthy), the request returns an error with backend status details.

Health Monitoring

The Cloud Worker continuously monitors the health of all connected backends at a 60-second interval. Health checks verify API reachability, authentication validity, and model availability.

Periodic Checks

Every 60 seconds, each backend receives a lightweight health probe. Response time and status are recorded for routing decisions.

Automatic Recovery

Backends that fail health checks are marked unhealthy and excluded from routing. They are automatically re-included when checks pass again.

Dashboard Visibility

Backend health status, latency, and model counts are displayed on the web dashboard at /dashboard for at-a-glance monitoring.

API Key Validation

Health checks also verify that API keys are valid and not expired, alerting before requests start failing due to authentication issues.

# Check cloud worker health curl http://localhost:8889/health # Response { "status": "healthy", "worker_type": "cloud", "worker_id": "cld-a1b2c3d4", "backends": { "total": 5, "healthy": 5, "unhealthy": 0 }, "models_available": 42, "uptime_seconds": 86400 }

Horizontal Scaling

Deploy multiple Cloud Worker instances for high availability and load distribution. Each instance registers independently with the controller, and the router distributes requests across all healthy cloud workers using standard load-balancing strategies.

Multiple Cloud Worker Instances
Router :8881 Cloud Worker #1 cld-aaa11111 Cloud Worker #2 cld-bbb22222 Cloud Worker #3 cld-ccc33333 xAI Groq OpenAI DeepSeek Anthropic Mistral Each instance can connect to different or overlapping sets of backends

Independent Registration

Each Cloud Worker instance registers with the controller using a unique cld- prefixed ID. The controller tracks all instances and their backend configurations.

Load Balancing

The router distributes requests across cloud workers using standard strategies: round-robin, least-connections, load-based, or latency-based.

Independent Backends

Each instance can connect to different sets of backends. Instance #1 might connect to xAI and Groq, while Instance #2 handles OpenAI and Anthropic.

Controller Registration

The Cloud Worker registers with the controller as a specialized worker type. This integration enables centralized monitoring, routing, and management alongside regular inference workers.

Property Value Description
worker_type "cloud" Distinguishes from regular inference workers ("inference")
ID Prefix cld- All cloud worker IDs start with cld- (e.g., cld-a1b2c3d4)
Default Port 8889 Avoids conflicts with regular workers (8890) and inference backends
Heartbeat 30s interval Reports health, active connections, and backend status to controller
Model Reporting Aggregated list Reports all models from all connected backends to the controller

OpenAI-Compatible API

The Cloud Worker exposes an OpenAI-compatible REST API with full streaming SSE support. Any client that works with the OpenAI API — including OpenWebUI, LangChain, LlamaIndex, and the Eldric CLI/GUI — can connect directly.

Endpoint Method Description
/health GET Health check with backend status summary
/dashboard GET Web-based monitoring dashboard
/v1/models GET List all models from all connected backends
/v1/chat/completions POST Chat completion (streaming and non-streaming)
/api/v1/backends GET List connected backends with health status
/api/v1/backends POST Add a new backend at runtime
/api/v1/backends/{id} DELETE Remove a backend
/api/v1/backends/{id}/test POST Test backend connectivity

Streaming SSE

Set "stream": true in chat completion requests for real-time token streaming via Server-Sent Events. The Cloud Worker proxies the SSE stream directly from the cloud backend with zero-copy forwarding.

# Streaming chat completion curl -N http://localhost:8889/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "grok-3", "stream": true, "messages": [ {"role": "user", "content": "Explain quantum computing"} ] }' # SSE response stream data: {"choices":[{"delta":{"content":"Quantum"}}]} data: {"choices":[{"delta":{"content":" computing"}}]} data: {"choices":[{"delta":{"content":" uses"}}]} ... data: [DONE]

Web Dashboard

Access the monitoring dashboard at http://cloud-worker:8889/dashboard for real-time visibility into backend status, model availability, request routing, and performance metrics.

Backend Status

View health status, response latency, and uptime for each connected cloud backend. Color-coded indicators for quick assessment.

Model Catalog

Browse all discovered models across all backends, with source attribution and availability status.

Request Metrics

Real-time request counts, routing decisions, and error rates per backend. Track which backends handle the most traffic.

Configuration

View and manage backend configurations, API keys, default backend selection, and health check intervals.

CLI Usage

Start the Cloud Worker with the eldric-cloudd binary. Configure backends via command-line flags or a JSON configuration file.

Basic Usage

# Start cloud worker with multiple backends ./eldric-cloudd --port 8889 \ --controller http://controller:8880 \ --backends "xai:https://api.x.ai/v1:YOUR_XAI_KEY,openai:https://api.openai.com/v1:YOUR_OPENAI_KEY" # With additional backends ./eldric-cloudd --port 8889 \ --controller http://controller:8880 \ --backends "xai:https://api.x.ai/v1:XAI_KEY,\ openai:https://api.openai.com/v1:OPENAI_KEY,\ anthropic:https://api.anthropic.com/v1:ANTHROPIC_KEY,\ deepseek:https://api.deepseek.com/v1:DEEPSEEK_KEY,\ groq:https://api.groq.com/openai/v1:GROQ_KEY" # With a remote Ollama instance as default backend ./eldric-cloudd --port 8889 \ --controller http://controller:8880 \ --default-backend ollama-remote \ --backends "ollama-remote:http://10.0.0.5:11434,xai:https://api.x.ai/v1:XAI_KEY" # From config file ./eldric-cloudd --config /etc/eldric/cloudd.json

Command-Line Options

Option Default Description
--port 8889 Listen port for the Cloud Worker API
--controller Controller URL for registration and heartbeat
--backends Comma-separated backend list: name:url:apikey
--default-backend First backend Backend name to use when model is not found
--config Path to JSON configuration file
--health-interval 60000 Health check interval in milliseconds

Configuration

For complex setups, use a JSON configuration file instead of command-line flags. This allows per-backend priority, custom headers, and other advanced settings.

# /etc/eldric/cloudd.json { "port": 8889, "controller_url": "http://controller:8880", "default_backend": "xai", "health_check_interval_ms": 60000, "backends": [ { "name": "xai", "type": "openai_compatible", "url": "https://api.x.ai/v1", "api_key": "xai-...", "priority": 1, "enabled": true }, { "name": "openai", "type": "openai_compatible", "url": "https://api.openai.com/v1", "api_key": "sk-...", "priority": 2, "enabled": true }, { "name": "anthropic", "type": "anthropic", "url": "https://api.anthropic.com/v1", "api_key": "sk-ant-...", "priority": 3, "enabled": true }, { "name": "deepseek", "type": "openai_compatible", "url": "https://api.deepseek.com/v1", "api_key": "sk-...", "priority": 4, "enabled": true }, { "name": "groq", "type": "openai_compatible", "url": "https://api.groq.com/openai/v1", "api_key": "gsk_...", "priority": 5, "enabled": true }, { "name": "ollama-remote", "type": "ollama", "url": "http://10.0.0.5:11434", "priority": 10, "enabled": true } ] }

API Key Management

API keys can be provided via the configuration file, command-line flags, or environment variables. For production deployments, environment variables are recommended to avoid storing keys in configuration files.

# Using environment variables export XAI_API_KEY="xai-..." export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." export DEEPSEEK_API_KEY="sk-..." export GROQ_API_KEY="gsk_..." ./eldric-cloudd --port 8889 \ --controller http://controller:8880 \ --backends "xai:https://api.x.ai/v1:$XAI_API_KEY,openai:https://api.openai.com/v1:$OPENAI_API_KEY"

Use Cases

The Cloud Worker bridges local infrastructure with cloud AI providers. Here are the primary deployment scenarios.

Hybrid Local + Cloud Inference

Run most traffic through local Ollama workers for privacy and cost control, while accessing frontier cloud models (GPT-4o, Claude, Grok-3) for complex tasks that require stronger reasoning. The router directs requests to local workers or the cloud worker based on the requested model.

1 Request for llama3.2
2 Router → Local Worker
1 Request for gpt-4o
2 Router → Cloud Worker → OpenAI

Cloud-to-Cloud Failover

Configure multiple backends for the same model family. If your primary provider (e.g., OpenAI) experiences downtime, requests automatically fail over to a secondary provider (e.g., Together AI running the same open model).

1 Primary: OpenAI
! Unhealthy
2 Fallback: Together AI

Cost Optimization

Route simple queries to low-cost providers (DeepSeek, Groq) while reserving expensive frontier models (GPT-4o, Claude Opus) for complex reasoning tasks. Combined with AI-powered routing on the router, this can be fully automated.

  • Simple queries: Groq (fast, low cost) or DeepSeek (cost-effective reasoning)
  • Complex analysis: OpenAI GPT-4o or Anthropic Claude
  • Code generation: xAI Grok-3 or DeepSeek Coder
  • Embeddings: Cohere or OpenAI text-embedding-3

Frontier Model Access

Give your entire organization access to the latest frontier models from every major provider through a single, centralized gateway. No need for individual API keys per user or application — the Cloud Worker manages authentication centrally.

  • Single endpoint for all cloud models
  • Centralized API key management
  • Usage tracking per model and backend
  • Rate limiting and cost controls

Cloud Worker vs. Inference Worker

Both worker types register with the controller and serve requests via the router, but they serve different purposes.

Aspect Inference Worker (eldric-workerd) Cloud Worker (eldric-cloudd)
Port 8890 8889
Worker Type inference cloud
ID Prefix wrk- cld-
Backend Count Single backend (Ollama, vLLM, etc.) Multiple backends simultaneously
Backend Location Typically local / same network Cloud APIs (remote endpoints)
Model Discovery From single backend Aggregated from all backends
Routing Direct proxy to backend Model-based routing to correct backend
Use Case Local GPU inference Cloud API aggregation & routing

Quick Start

Get up and running with the Cloud Worker in minutes.

1. Build the Cloud Worker

cd cpp && mkdir -p build && cd build cmake -DBUILD_DISTRIBUTED=ON .. make eldric-cloudd

2. Start with xAI and OpenAI

./eldric-cloudd --port 8889 \ --controller http://localhost:8880 \ --backends "xai:https://api.x.ai/v1:$XAI_API_KEY,openai:https://api.openai.com/v1:$OPENAI_API_KEY"

3. List Available Models

curl http://localhost:8889/v1/models | jq '.data[].id'

4. Chat with a Cloud Model

curl http://localhost:8889/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "grok-3", "messages": [{"role": "user", "content": "Hello from Eldric Cloud Worker!"}] }'

5. Add a Backend at Runtime

curl -X POST http://localhost:8889/api/v1/backends \ -H "Content-Type: application/json" \ -d '{ "name": "anthropic", "type": "anthropic", "url": "https://api.anthropic.com/v1", "api_key": "sk-ant-..." }'

6. Open the Dashboard

# Open in browser open http://localhost:8889/dashboard

License Limits

Feature Free Standard Professional Enterprise
Cloud Workers 1 2 5 Unlimited
Backends per Worker 2 5 15 Unlimited
Auto-Discovery Yes Yes Yes Yes
Priority Routing Yes Yes Yes
Health Monitoring Yes Yes Yes Yes
Dashboard Yes Yes
Runtime Backend Add/Remove Yes Yes Yes

Contact license@core.at for enterprise licensing. License files are managed via the license server.