Eldric Cloud Worker

Multi-backend cloud inference gateway. Route AI requests across xAI, OpenAI, Anthropic, DeepSeek, Groq, and more — from a single unified OpenAI-compatible endpoint.

Port 8889 · eldric-cloudd

Overview

The Cloud Worker (eldric-cloudd) is a multi-backend cloud inference gateway running on Port 8889. It manages multiple cloud API backends simultaneously, routing requests to the correct provider based on the requested model name. Unlike the standard inference worker (which proxies a single local backend like Ollama), the Cloud Worker aggregates dozens of cloud APIs behind one unified endpoint.

Multi-Backend Aggregation

Connect to xAI, OpenAI, Anthropic, DeepSeek, Groq, Together AI, Fireworks, Mistral, Cohere, and any OpenAI-compatible endpoint — all through a single gateway.

Model-Based Routing

Requests are automatically routed to the correct backend based on the model name. Ask for grok-3 and traffic goes to xAI; ask for claude-3.5-sonnet and it goes to Anthropic.

OpenAI-Compatible API

Exposes a standard /v1/chat/completions endpoint with full streaming SSE support. Drop-in replacement for any OpenAI client library.

Health Monitoring

Continuous 60-second health checks on all connected backends. Unhealthy backends are automatically excluded from routing until they recover.

Priority & Fallback

Priority-based routing with automatic fallback to a default backend when the requested model is not found on any provider.

Horizontal Scaling

Deploy multiple Cloud Worker instances. Each registers independently with the controller using a cld- ID prefix, enabling load distribution across gateways.

Architecture

The Cloud Worker sits within the Eldric distributed stack alongside regular inference workers. The controller and router treat it as a specialized worker with worker_type: "cloud".

Cloud Worker in the Eldric Stack

Request

/v1/chat/completions
model: "grok-3"

→

Cloud Worker

Model lookup & backend resolution

→

xAI API

Forwarded with SSE streaming

→

Response

Streamed back to caller

Supported Cloud Backends

The Cloud Worker supports all major cloud LLM providers and any endpoint following the OpenAI API specification.

Backend	API Endpoint	Example Models	Streaming	Tool Calling
Ollama Cloud	`http://remote:11434`	llama3.2, qwen3, mistral, gemma3	Yes	Yes
xAI / Grok	`https://api.x.ai/v1`	grok-3, grok-3-mini, grok-2	Yes	Yes
OpenAI	`https://api.openai.com/v1`	gpt-4o, gpt-4-turbo, gpt-4, o1, o3	Yes	Yes
Anthropic	`https://api.anthropic.com/v1`	claude-sonnet-4-20250514, claude-3.5-sonnet, claude-3-opus	Yes	Yes
DeepSeek	`https://api.deepseek.com/v1`	deepseek-chat, deepseek-reasoner	Yes	Yes
Groq	`https://api.groq.com/openai/v1`	llama-3.3-70b, mixtral-8x7b, gemma2-9b	Yes	Yes
Together AI	`https://api.together.xyz/v1`	meta-llama/Llama-3.3-70B, Qwen/Qwen2.5-72B	Yes	Yes
Fireworks AI	`https://api.fireworks.ai/inference/v1`	llama-v3p3-70b, mixtral-8x22b	Yes	Yes
Mistral AI	`https://api.mistral.ai/v1`	mistral-large, mistral-medium, codestral	Yes	Yes
Cohere	`https://api.cohere.ai/v1`	command-r-plus, command-r, embed-english	Yes	Yes
OpenAI-Compatible	`Any URL`	Any model following OpenAI spec	Yes	Varies

xAI / Grok

Frontier reasoning with Grok-3

Cloud

OpenAI

GPT-4o, o1, o3 reasoning

Cloud

Anthropic

Claude Sonnet, Opus

Cloud

DeepSeek

Cost-effective reasoning

Cloud

Groq

Ultra-fast inference

Cloud

Together AI

Open model hosting

Cloud

Fireworks AI

Optimized open models

Cloud

Mistral AI

European AI provider

Cloud

Cohere

Enterprise NLP & RAG

Cloud

Ollama Cloud

Remote Ollama instances

Remote Local

Custom Endpoint

Any OpenAI-compatible API

Compatible

Auto-Discovery

The Cloud Worker automatically discovers available models from all connected backends at startup and refreshes periodically. When a new backend is added, its models are queried and merged into the unified model catalog.

Model Catalog

All models from all backends are aggregated into a single catalog, served via GET /v1/models. Each model entry includes its source backend, so clients can see every available model in one call.

Queries each backend's model list endpoint
Merges into unified model namespace
Tags models with backend source
Refreshes on configurable interval

Backend Registration

Backends can be configured statically via CLI flags or configuration file, and added dynamically via the API at runtime without restarting the worker.

Static configuration via --backends flag
Dynamic registration via REST API
Hot-reload without worker restart
Per-backend API key management

# List all models across all backends
curl http://localhost:8889/v1/models

# Response includes models from every connected backend
{
  "data": [
    {"id": "grok-3", "owned_by": "xai"},
    {"id": "gpt-4o", "owned_by": "openai"},
    {"id": "claude-3.5-sonnet", "owned_by": "anthropic"},
    {"id": "deepseek-chat", "owned_by": "deepseek"},
    {"id": "llama-3.3-70b-versatile", "owned_by": "groq"},
    ...
  ]
}
            

Priority-Based Routing

When a chat completion request arrives, the Cloud Worker resolves the model name to the correct backend. If multiple backends serve the same model, priority determines which backend handles the request. If the model is not found on any backend, the request falls back to the configured default backend.

Model Resolution

Each request specifies a model name. The Cloud Worker maintains a model-to-backend mapping built from auto-discovery. The first backend that owns the model receives the request.

Default Backend Fallback

A default backend can be configured for requests where the model is not found. This ensures requests always have somewhere to go, even with unknown model names.

Health-Aware Selection

Only healthy backends are considered for routing. If a backend fails health checks, its models are temporarily excluded until the backend recovers.

Routing Flow

1 Request arrives

→

2 Extract model name

→

3 Lookup backend

→

4 Check health

→

5 Forward request

If step 3 fails (model not found), the request is forwarded to the default backend. If step 4 fails (backend unhealthy), the request returns an error with backend status details.

Health Monitoring

The Cloud Worker continuously monitors the health of all connected backends at a 60-second interval. Health checks verify API reachability, authentication validity, and model availability.

Periodic Checks

Every 60 seconds, each backend receives a lightweight health probe. Response time and status are recorded for routing decisions.

Automatic Recovery

Backends that fail health checks are marked unhealthy and excluded from routing. They are automatically re-included when checks pass again.

Dashboard Visibility

Backend health status, latency, and model counts are displayed on the web dashboard at /dashboard for at-a-glance monitoring.

API Key Validation

Health checks also verify that API keys are valid and not expired, alerting before requests start failing due to authentication issues.

# Check cloud worker health
curl http://localhost:8889/health

# Response
{
  "status": "healthy",
  "worker_type": "cloud",
  "worker_id": "cld-a1b2c3d4",
  "backends": {
    "total": 5,
    "healthy": 5,
    "unhealthy": 0
  },
  "models_available": 42,
  "uptime_seconds": 86400
}
            

Horizontal Scaling

Deploy multiple Cloud Worker instances for high availability and load distribution. Each instance registers independently with the controller, and the router distributes requests across all healthy cloud workers using standard load-balancing strategies.

Multiple Cloud Worker Instances

Independent Registration

Each Cloud Worker instance registers with the controller using a unique cld- prefixed ID. The controller tracks all instances and their backend configurations.

Load Balancing

The router distributes requests across cloud workers using standard strategies: round-robin, least-connections, load-based, or latency-based.

Independent Backends

Each instance can connect to different sets of backends. Instance #1 might connect to xAI and Groq, while Instance #2 handles OpenAI and Anthropic.

Controller Registration

The Cloud Worker registers with the controller as a specialized worker type. This integration enables centralized monitoring, routing, and management alongside regular inference workers.

Property	Value	Description
`worker_type`	`"cloud"`	Distinguishes from regular inference workers (`"inference"`)
ID Prefix	`cld-`	All cloud worker IDs start with `cld-` (e.g., `cld-a1b2c3d4`)
Default Port	8889	Avoids conflicts with regular workers (8890) and inference backends
Heartbeat	30s interval	Reports health, active connections, and backend status to controller
Model Reporting	Aggregated list	Reports all models from all connected backends to the controller

OpenAI-Compatible API

The Cloud Worker exposes an OpenAI-compatible REST API with full streaming SSE support. Any client that works with the OpenAI API — including OpenWebUI, LangChain, LlamaIndex, and the Eldric CLI/GUI — can connect directly.

Endpoint	Method	Description
`/health`	GET	Health check with backend status summary
`/dashboard`	GET	Web-based monitoring dashboard
`/v1/models`	GET	List all models from all connected backends
`/v1/chat/completions`	POST	Chat completion (streaming and non-streaming)
`/api/v1/backends`	GET	List connected backends with health status
`/api/v1/backends`	POST	Add a new backend at runtime
`/api/v1/backends/{id}`	DELETE	Remove a backend
`/api/v1/backends/{id}/test`	POST	Test backend connectivity

Streaming SSE

Set "stream": true in chat completion requests for real-time token streaming via Server-Sent Events. The Cloud Worker proxies the SSE stream directly from the cloud backend with zero-copy forwarding.

# Streaming chat completion
curl -N http://localhost:8889/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ]
  }'

# SSE response stream
data: {"choices":[{"delta":{"content":"Quantum"}}]}
data: {"choices":[{"delta":{"content":" computing"}}]}
data: {"choices":[{"delta":{"content":" uses"}}]}
...
data: [DONE]
            

Web Dashboard

Access the monitoring dashboard at http://cloud-worker:8889/dashboard for real-time visibility into backend status, model availability, request routing, and performance metrics.

Backend Status

View health status, response latency, and uptime for each connected cloud backend. Color-coded indicators for quick assessment.

Model Catalog

Browse all discovered models across all backends, with source attribution and availability status.

Request Metrics

Real-time request counts, routing decisions, and error rates per backend. Track which backends handle the most traffic.

Configuration

View and manage backend configurations, API keys, default backend selection, and health check intervals.

CLI Usage

Start the Cloud Worker with the eldric-cloudd binary. Configure backends via command-line flags or a JSON configuration file.

Basic Usage

# Start cloud worker with multiple backends
./eldric-cloudd --port 8889 \
  --controller http://controller:8880 \
  --backends "xai:https://api.x.ai/v1:YOUR_XAI_KEY,openai:https://api.openai.com/v1:YOUR_OPENAI_KEY"

# With additional backends
./eldric-cloudd --port 8889 \
  --controller http://controller:8880 \
  --backends "xai:https://api.x.ai/v1:XAI_KEY,\
openai:https://api.openai.com/v1:OPENAI_KEY,\
anthropic:https://api.anthropic.com/v1:ANTHROPIC_KEY,\
deepseek:https://api.deepseek.com/v1:DEEPSEEK_KEY,\
groq:https://api.groq.com/openai/v1:GROQ_KEY"

# With a remote Ollama instance as default backend
./eldric-cloudd --port 8889 \
  --controller http://controller:8880 \
  --default-backend ollama-remote \
  --backends "ollama-remote:http://10.0.0.5:11434,xai:https://api.x.ai/v1:XAI_KEY"

# From config file
./eldric-cloudd --config /etc/eldric/cloudd.json
            

Command-Line Options

Option	Default	Description
`--port`	`8889`	Listen port for the Cloud Worker API
`--controller`	—	Controller URL for registration and heartbeat
`--backends`	—	Comma-separated backend list: `name:url:apikey`
`--default-backend`	First backend	Backend name to use when model is not found
`--config`	—	Path to JSON configuration file
`--health-interval`	`60000`	Health check interval in milliseconds

Configuration

For complex setups, use a JSON configuration file instead of command-line flags. This allows per-backend priority, custom headers, and other advanced settings.

# /etc/eldric/cloudd.json
{
  "port": 8889,
  "controller_url": "http://controller:8880",
  "default_backend": "xai",
  "health_check_interval_ms": 60000,
  "backends": [
    {
      "name": "xai",
      "type": "openai_compatible",
      "url": "https://api.x.ai/v1",
      "api_key": "xai-...",
      "priority": 1,
      "enabled": true
    },
    {
      "name": "openai",
      "type": "openai_compatible",
      "url": "https://api.openai.com/v1",
      "api_key": "sk-...",
      "priority": 2,
      "enabled": true
    },
    {
      "name": "anthropic",
      "type": "anthropic",
      "url": "https://api.anthropic.com/v1",
      "api_key": "sk-ant-...",
      "priority": 3,
      "enabled": true
    },
    {
      "name": "deepseek",
      "type": "openai_compatible",
      "url": "https://api.deepseek.com/v1",
      "api_key": "sk-...",
      "priority": 4,
      "enabled": true
    },
    {
      "name": "groq",
      "type": "openai_compatible",
      "url": "https://api.groq.com/openai/v1",
      "api_key": "gsk_...",
      "priority": 5,
      "enabled": true
    },
    {
      "name": "ollama-remote",
      "type": "ollama",
      "url": "http://10.0.0.5:11434",
      "priority": 10,
      "enabled": true
    }
  ]
}
            

API Key Management

API keys can be provided via the configuration file, command-line flags, or environment variables. For production deployments, environment variables are recommended to avoid storing keys in configuration files.

# Using environment variables
export XAI_API_KEY="xai-..."
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DEEPSEEK_API_KEY="sk-..."
export GROQ_API_KEY="gsk_..."

./eldric-cloudd --port 8889 \
  --controller http://controller:8880 \
  --backends "xai:https://api.x.ai/v1:$XAI_API_KEY,openai:https://api.openai.com/v1:$OPENAI_API_KEY"
            

Use Cases

The Cloud Worker bridges local infrastructure with cloud AI providers. Here are the primary deployment scenarios.

Hybrid Local + Cloud Inference

Run most traffic through local Ollama workers for privacy and cost control, while accessing frontier cloud models (GPT-4o, Claude, Grok-3) for complex tasks that require stronger reasoning. The router directs requests to local workers or the cloud worker based on the requested model.

1 Request for llama3.2

→

2 Router → Local Worker

1 Request for gpt-4o

→

2 Router → Cloud Worker → OpenAI

Cloud-to-Cloud Failover

Configure multiple backends for the same model family. If your primary provider (e.g., OpenAI) experiences downtime, requests automatically fail over to a secondary provider (e.g., Together AI running the same open model).

1 Primary: OpenAI

→

! Unhealthy

→

2 Fallback: Together AI

Cost Optimization

Route simple queries to low-cost providers (DeepSeek, Groq) while reserving expensive frontier models (GPT-4o, Claude Opus) for complex reasoning tasks. Combined with AI-powered routing on the router, this can be fully automated.

Simple queries: Groq (fast, low cost) or DeepSeek (cost-effective reasoning)
Complex analysis: OpenAI GPT-4o or Anthropic Claude
Code generation: xAI Grok-3 or DeepSeek Coder
Embeddings: Cohere or OpenAI text-embedding-3

Frontier Model Access

Give your entire organization access to the latest frontier models from every major provider through a single, centralized gateway. No need for individual API keys per user or application — the Cloud Worker manages authentication centrally.

Single endpoint for all cloud models
Centralized API key management
Usage tracking per model and backend
Rate limiting and cost controls

Cloud Worker vs. Inference Worker

Both worker types register with the controller and serve requests via the router, but they serve different purposes.

Aspect	Inference Worker (eldric-workerd)	Cloud Worker (eldric-cloudd)
Port	8890	8889
Worker Type	`inference`	`cloud`
ID Prefix	`wrk-`	`cld-`
Backend Count	Single backend (Ollama, vLLM, etc.)	Multiple backends simultaneously
Backend Location	Typically local / same network	Cloud APIs (remote endpoints)
Model Discovery	From single backend	Aggregated from all backends
Routing	Direct proxy to backend	Model-based routing to correct backend
Use Case	Local GPU inference	Cloud API aggregation & routing

Quick Start

Get up and running with the Cloud Worker in minutes.

1. Build the Cloud Worker

cd cpp && mkdir -p build && cd build
cmake -DBUILD_DISTRIBUTED=ON ..
make eldric-cloudd
            

2. Start with xAI and OpenAI

./eldric-cloudd --port 8889 \
  --controller http://localhost:8880 \
  --backends "xai:https://api.x.ai/v1:$XAI_API_KEY,openai:https://api.openai.com/v1:$OPENAI_API_KEY"
            

3. List Available Models

curl http://localhost:8889/v1/models | jq '.data[].id'

4. Chat with a Cloud Model

curl http://localhost:8889/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3",
    "messages": [{"role": "user", "content": "Hello from Eldric Cloud Worker!"}]
  }'
            

5. Add a Backend at Runtime

curl -X POST http://localhost:8889/api/v1/backends \
  -H "Content-Type: application/json" \
  -d '{
    "name": "anthropic",
    "type": "anthropic",
    "url": "https://api.anthropic.com/v1",
    "api_key": "sk-ant-..."
  }'
            

6. Open the Dashboard

# Open in browser
open http://localhost:8889/dashboard
            

License Limits

Feature	Free	Standard	Professional	Enterprise
Cloud Workers	1	2	5	Unlimited
Backends per Worker	2	5	15	Unlimited
Auto-Discovery	Yes	Yes	Yes	Yes
Priority Routing	—	Yes	Yes	Yes
Health Monitoring	Yes	Yes	Yes	Yes
Dashboard	—	—	Yes	Yes
Runtime Backend Add/Remove	—	Yes	Yes	Yes

Contact license@core.at for enterprise licensing. License files are managed via the license server.

Overview

Multi-Backend Aggregation

Model-Based Routing

OpenAI-Compatible API

Health Monitoring

Priority & Fallback

Horizontal Scaling

Architecture

Request

Cloud Worker

xAI API

Response

Supported Cloud Backends

xAI / Grok

OpenAI

Anthropic

DeepSeek

Groq

Together AI

Fireworks AI

Mistral AI

Cohere

Ollama Cloud

Custom Endpoint

Auto-Discovery

Model Catalog

Backend Registration

Priority-Based Routing

Model Resolution

Default Backend Fallback

Health-Aware Selection

Routing Flow

Health Monitoring

Periodic Checks

Automatic Recovery

Dashboard Visibility

API Key Validation

Horizontal Scaling

Independent Registration

Load Balancing

Independent Backends

Controller Registration

OpenAI-Compatible API

Streaming SSE

Web Dashboard

Backend Status

Model Catalog

Request Metrics

Configuration

CLI Usage

Basic Usage

Command-Line Options

Configuration

API Key Management

Use Cases

Hybrid Local + Cloud Inference

Cloud-to-Cloud Failover

Cost Optimization

Frontier Model Access

Cloud Worker vs. Inference Worker

Quick Start

1. Build the Cloud Worker

2. Start with xAI and OpenAI

3. List Available Models

4. Chat with a Cloud Model

5. Add a Backend at Runtime

6. Open the Dashboard

License Limits

Related Components

Cluster Architecture

Backends

Edge Server

Knowledge Routing