Eldric Edge Server

The secure external gateway for your Eldric cluster. TLS termination, API key authentication, rate limiting, plugin extensibility, and an embedded web chat client -- all in a single binary.

Port 443/80 · eldric-edge

Overview

The Edge Server is the external-facing gateway for the entire Eldric distributed cluster. It sits at the perimeter, accepting HTTPS connections from external clients (OpenWebUI, custom apps, mobile clients), performing TLS termination, authenticating requests via API keys, enforcing rate limits, and proxying traffic to downstream routers and workers. It also hosts an embedded web chat UI and a plugin system for server-side and client-side extensibility.

TLS Termination

HTTPS on port 443 with manual certificate configuration or automatic Let's Encrypt support. All downstream traffic stays on the internal network.

API Key Authentication

Per-key authentication with named clients, individual rate limits, and client registration endpoints for self-service key provisioning.

Rate Limiting

Configurable rate limits at three levels: global cluster RPM, per-IP RPM, and per-API-key RPM with sliding time windows.

Plugin System

Extend the Edge with server-side Python plugins (Tools, Filters, Pipes) and client-side JavaScript plugins (Actions, Widgets).

Farm Mode

Horizontal scaling with multiple Edge instances. Peer synchronization keeps API keys and rate limit counters consistent across the farm.

Embedded Web Chat

Built-in browser-based chat UI at /chat with model selection, streaming responses, and conversation history. No external client required.

Architecture

The Edge Server acts as the single entry point for all external traffic, forwarding authenticated requests to the internal cluster infrastructure.

Edge Server Request Flow
EXTERNAL Browsers EXTERNAL OpenWebUI EXTERNAL Mobile Apps EXTERNAL curl / APIs HTTPS EDGE SERVER Port 443 / 80 TLS Termination API Key Auth Rate Limiter Plugin Host (Filters) Internal HTTP ROUTER A Load Balancer Port 8881 ROUTER B Load Balancer Port 8881 INFERENCE Worker :8890 INFERENCE Worker :8890 INFERENCE Worker :8890 CLOUD Worker :8889 CLOUD Worker :8889 External Traffic Flow Client -> Edge (TLS/Auth/Rate Limit) -> Router (Load Balance) -> Worker (Inference) External Edge/TLS Auth Router Workers

TLS Termination

The Edge Server handles HTTPS termination so that all internal cluster traffic can remain unencrypted on the private network. Configure TLS with manual certificates or integrate with Let's Encrypt for automatic renewal.

Manual Certificate Configuration

# Start with manual TLS certificates ./eldric-edge --port 443 \ --cert /etc/ssl/cert.pem \ --key /etc/ssl/key.pem \ --routers http://10.3.7.47:8881

TLS Configuration (JSON)

"tls": { "mode": "manual", // "manual" or "acme" "cert_file": "/etc/ssl/cert.pem", "key_file": "/etc/ssl/key.pem" }

API Key Authentication

Protect your cluster with API key authentication. Each key is associated with a named client and can have individual rate limits. Keys are passed via the Authorization: Bearer sk-... header or the X-API-Key header.

Configuring API Keys

"auth": { "require_api_key": true, "api_keys": { "sk-openwebui-key": "openwebui-client", "sk-custom-app": "custom-app-client", "sk-mobile-ios": "ios-app" } }

Client Registration

Register new clients programmatically via the API:

# Register a new client curl -X POST https://edge.example.com/api/v1/clients/register \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-admin-key" \ -d '{"name": "my-app", "description": "Production app"}' # List registered clients curl https://edge.example.com/api/v1/clients \ -H "Authorization: Bearer sk-admin-key"

Rate Limiting

Three-tier rate limiting protects your cluster from overload. Limits are enforced using sliding time windows and can be configured globally, per IP address, and per API key.

Global RPM

Maximum requests per minute across all clients combined. Protects the entire cluster from saturation.

Default: 10,000 RPM

Per-IP RPM

Maximum requests per minute from a single IP address. Prevents individual clients from monopolizing resources.

Default: 100 RPM

Per-Key RPM

Maximum requests per minute for a specific API key. Allows differentiated service levels per client.

Default: 1,000 RPM

Configuration

"rate_limits": { "global_rpm": 10000, // Total cluster RPM "per_ip_rpm": 100, // Per IP address "per_key_rpm": 1000, // Per API key "window_seconds": 60 // Sliding window size }

When a rate limit is exceeded, the Edge returns HTTP 429 Too Many Requests with a Retry-After header indicating when the client can retry.

Farm Mode

Deploy multiple Edge Server instances for high availability and horizontal scaling. Farm mode synchronizes API keys, rate limit counters, and upstream health status across all peers.

Edge Farm Architecture
Horizontal Scaling with Peer Synchronization DNS / LOAD BALANCER edge.example.com EDGE #1 (PRIMARY) edge1:443 TLS + Auth + Rate Limit Plugins + Web Chat TLS AUTH EDGE #2 (PEER) edge2:443 TLS + Auth + Rate Limit Plugins + Web Chat TLS AUTH EDGE #3 (PEER) edge3:443 TLS + Auth + Rate Limit Plugins + Web Chat TLS AUTH SYNC SYNC INTERNAL CLUSTER Routers :8881 + Workers :8890 + Cloud Workers :8889 Edge Instances Peer Sync Internal Routing Client Traffic

Farm Configuration

# Start in farm mode with peers ./eldric-edge --mode farm \ --peers edge2:443,edge3:443 \ --routers http://router1:8881,http://router2:8881
"farm": { "peers": ["edge2:443", "edge3:443"], "sync_interval_ms": 5000 }

Embedded Web Chat

The Edge Server includes a fully functional browser-based chat interface, eliminating the need for external clients like OpenWebUI for basic usage. Access the chat UI directly at /chat.

Chat Page (/chat)

  • Single-page application with real-time streaming
  • Model selector with all available cluster models
  • Conversation history with local storage persistence
  • GitHub-inspired dark theme
  • Markdown rendering in responses
  • Code block syntax highlighting

Login Page (/login)

  • Authentication UI when API key auth is required
  • API key entry with validation
  • Session persistence via secure cookies
  • Automatic redirect to /chat after login

Key implementation files: cpp/include/distributed/edge/edge_webclient.h and cpp/src/distributed/edge/edge_webclient.cpp.

Plugin System

Extend the Edge Server with plugins for custom tools, request/response filtering, virtual model backends, and client-side UI widgets. Plugins support both server-side Python execution and client-side JavaScript.

Plugin Types

Tool Plugins

Server-side tools callable by the LLM during inference. Executed as Python subprocesses communicating via JSON-RPC. Ideal for integrating external APIs and databases.

Execution: Python subprocess

Filter Plugins

Pre- and post-LLM message processing. Inlet filters run before the request reaches the LLM; Outlet filters process the response before it is returned to the client.

Stages: Inlet (pre-LLM) + Outlet (post-LLM)

Pipe Plugins

Virtual models powered by custom backends. A Pipe plugin registers as a model that appears in the model list and routes requests to an arbitrary processing pipeline.

Execution: Python subprocess

Action & Widget Plugins

Client-side UI extensions served as JavaScript to the browser. Actions add interactive buttons; Widgets embed custom UI components into the web chat interface.

Execution: JavaScript (browser)
Plugin Architecture
Plugin Processing Pipeline INCOMING Request FILTER PLUGIN Inlet Pre-process INFERENCE LLM Ollama / vLLM / Cloud FILTER PLUGIN Outlet Post-process OUTGOING Response TOOL PLUGINS LLM-callable tools Python subprocess (JSON-RPC) result PIPE PLUGINS (VIRTUAL MODELS) Custom backend pipeline Replaces LLM inference entirely alternative path ACTION PLUGINS Interactive buttons (JS) WIDGET PLUGINS Custom UI components (JS) served to browser Filters Tools Pipes Actions/Widgets

Plugin Directory Structure

plugins/ ├── my-tool/ │ ├── manifest.json # Plugin metadata (name, version, type, entry point) │ ├── main.py # Plugin implementation │ └── valves.json # Runtime configuration (API keys, thresholds) ├── content-filter/ │ ├── manifest.json │ ├── main.py │ └── valves.json └── custom-model/ ├── manifest.json ├── main.py └── valves.json

Plugin API Endpoints

Endpoint Method Description
/api/v1/plugins GET List all installed plugins with status
/api/v1/plugins/{id}/enable POST Enable a plugin
/api/v1/plugins/{id}/disable POST Disable a plugin
/api/v1/plugins/{id}/valves GET Get plugin configuration (valves)
/api/v1/plugins/{id}/valves PUT Update plugin configuration (valves)

Key implementation files: cpp/include/distributed/edge/edge_plugin_host.h and cpp/src/distributed/edge/edge_plugin_host.cpp.

Streaming Support

The Edge Server provides zero-copy SSE (Server-Sent Events) proxy for real-time token streaming. Streaming flows through the full distributed stack with minimal latency overhead.

Zero-Copy Streaming Flow
CLIENT Browser stream: true EDGE TLS Proxy Zero-copy SSE ROUTER Load Balance SSE pass-through WORKER Inference Port 8890 BACKEND LLM Ollama / vLLM SSE SSE SSE SSE data: {"choices":[{"delta":{"content":"Hello"}}]} data: {"choices":[{"delta":{"content":" world"}}]} data: [DONE] POST HTTP HTTP API Tokens are forwarded as they arrive -- no buffering, no copying, minimal latency
# SSE streaming response format (OpenAI-compatible) data: {"choices":[{"delta":{"content":"Hello"}}]} data: {"choices":[{"delta":{"content":" world"}}]} data: [DONE]

Configuration

The Edge Server can be configured via command-line flags or a JSON configuration file. The config file provides the full set of options.

Complete Configuration File

{ "mode": "single", // "single" or "farm" "https_port": 443, "http_port": 80, "bind_address": "0.0.0.0", "router_urls": [ "http://router1:8881", "http://router2:8881" ], "controller_url": "http://controller:8880", "tls": { "mode": "manual", // "manual" or "acme" "cert_file": "/etc/ssl/cert.pem", "key_file": "/etc/ssl/key.pem" }, "auth": { "require_api_key": true, "api_keys": { "sk-openwebui-key": "openwebui-client", "sk-custom-app": "custom-app-client" } }, "rate_limits": { "global_rpm": 10000, "per_ip_rpm": 100, "per_key_rpm": 1000, "window_seconds": 60 }, "farm": { "peers": ["edge2:443", "edge3:443"], "sync_interval_ms": 5000 } }
# Load from config file ./eldric-edge --config /etc/eldric/edge.json

API Endpoints

Endpoint Method Description
/health GET Health check (returns upstream router status)
/metrics GET Server metrics (requests, latency, rate limit stats)
/v1/chat/completions POST OpenAI-compatible chat (proxied to router/worker)
/v1/models GET List available models (proxied to router/worker)
/api/v1/clients/register POST Register a new API client
/api/v1/clients GET List registered clients
/api/v1/upstreams GET Upstream router health and status
/chat GET Embedded web chat client (browser UI)
/login GET Authentication page (when API key auth is enabled)
/api/v1/plugins GET List installed plugins
/api/v1/plugins/{id}/enable POST Enable a plugin
/api/v1/plugins/{id}/disable POST Disable a plugin
/api/v1/plugins/{id}/valves GET / PUT Get or update plugin configuration

CLI Usage

Basic Usage with Routers

# Connect to one or more routers ./eldric-edge --port 8443 --routers http://router1:8881,http://router2:8881

With TLS Certificates

# HTTPS with manual certificates ./eldric-edge --port 443 \ --cert /etc/ssl/cert.pem \ --key /etc/ssl/key.pem \ --routers http://10.3.7.47:8881

With API Key Authentication

# Require API key for all requests ./eldric-edge --port 443 \ --routers http://router:8881 \ --api-key "sk-your-api-key-here"

Farm Mode

# Run as part of an Edge farm ./eldric-edge --mode farm \ --peers edge2:443,edge3:443 \ --routers http://router:8881

HTTP Only (No TLS)

# For testing or behind a reverse proxy ./eldric-edge --no-tls --http-port 8080 \ --routers http://router:8881

From Configuration File

# Load all settings from JSON config ./eldric-edge --config /etc/eldric/edge.json

Connecting External Clients

The Edge Server is fully OpenAI API compatible, making it a drop-in replacement for any tool that supports the OpenAI endpoint format.

OpenWebUI

Configure OpenWebUI to connect through the Edge Server:

OPENAI_API_BASE_URL=https://edge.example.com/v1 OPENAI_API_KEY=sk-your-api-key-here

curl

# Chat completion through the Edge curl -X POST https://edge.example.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-your-api-key-here" \ -d '{ "model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}], "stream": true }'

Python (OpenAI SDK)

from openai import OpenAI client = OpenAI( base_url="https://edge.example.com/v1", api_key="sk-your-api-key-here" ) response = client.chat.completions.create( model="llama3.2:3b", messages=[{"role": "user", "content": "Hello!"}], stream=True ) for chunk in response: print(chunk.choices[0].delta.content, end="")

Eldric Mobile (iOS)

# In the iOS app Settings, set: Server URL: https://edge.example.com API Key: sk-your-api-key-here

Quick Start

Get the Edge Server running in under a minute.

1. Build the Edge Server

cd cpp && mkdir build && cd build cmake -DBUILD_DISTRIBUTED=ON .. make eldric-edge

2. Start the Edge (HTTP mode for testing)

./eldric-edge --no-tls --http-port 8080 \ --routers http://localhost:8881

3. Test the Connection

# Check health curl http://localhost:8080/health # List available models curl http://localhost:8080/v1/models # Send a chat request curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}] }'

4. Open the Web Chat

# Open in your browser open http://localhost:8080/chat

5. Enable TLS for Production

./eldric-edge --port 443 \ --cert /etc/ssl/cert.pem \ --key /etc/ssl/key.pem \ --routers http://router1:8881,http://router2:8881 \ --api-key "sk-production-key"

Key Files

File Description
cpp/src/distributed/edge/edge_server.cpp Edge server implementation (TLS, auth, rate limiting, proxy)
cpp/src/distributed/edge/edge_main.cpp CLI entry point and argument parsing
cpp/include/distributed/edge/edge_types.h Configuration types and data structures
cpp/include/distributed/edge/edge_server.h EdgeServer class declaration
cpp/include/distributed/edge/edge_webclient.h Embedded web chat UI (HTML/CSS/JS generation)
cpp/src/distributed/edge/edge_webclient.cpp Web chat client implementation
cpp/include/distributed/edge/edge_plugin_host.h Plugin system declaration
cpp/src/distributed/edge/edge_plugin_host.cpp Plugin host implementation (load, execute, manage plugins)

Firewall & Ports

Port Protocol Direction Description
443 TCP Inbound HTTPS (external clients)
80 TCP Inbound HTTP (redirect to HTTPS or HTTP-only mode)
8881 TCP Outbound Router connections (internal)
8880 TCP Outbound Controller registration (internal)

Remote Worker Registration & NAT Tunnel

The Edge server proxies cluster management traffic, allowing workers anywhere on the internet to join the cluster. Workers behind NAT/firewalls use the built-in tunnel for receiving inference requests through outbound-only connections.

Three Connection Modes

Direct via Edge (HTTPS)

Workers with internet access register through the Edge TLS gateway. The Edge proxies registration, heartbeat, and pipeline management to the controller. API key authentication and rate limiting are enforced.

# Worker registers through Edge ./eldric-workerd --backend ollama \ --controller https://edge.example.com \ --api-key sk-remote-worker

NAT Tunnel (No VPN, No Public IP)

Workers behind NAT connect outbound to the Edge and long-poll for inference requests. No inbound ports, no VPN required. The worker processes requests locally and sends results back through the same outbound connection.

# Worker behind NAT uses tunnel ./eldric-workerd --backend ollama \ --controller https://edge.example.com \ --tunnel https://edge.example.com \ --api-key sk-home-office

VPN / Private Network

Sites connected via VPN or WireGuard access the controller directly on the private network. No Edge proxy needed. Standard deployment for corporate environments with existing network infrastructure.

# Direct to controller over VPN ./eldric-workerd --backend ollama \ --controller http://ctrl:8880

Edge Proxy Endpoints for Cluster Management

These endpoints are proxied through the Edge to the Controller, enabling remote worker registration with TLS + API key auth.

Edge Route Method Description
Worker Registration
/api/v1/workers/registerPOSTRegister inference worker with controller
/api/v1/workers/:id/heartbeatPOSTWorker heartbeat with metrics and status
/api/v1/workers/discoverPOSTDiscover other workers in the cluster
/api/v1/data-workers/registerPOSTRegister data worker
/api/v1/science-workers/registerPOSTRegister science worker
/api/v1/training-workers/registerPOSTRegister training worker
Pipeline Management
/api/v1/pipeline/deployPOSTDeploy distributed model across workers
/api/v1/pipeline/modelsGETList distributed pipeline models
/api/v1/pipeline/statusGETPipeline shard status
/api/v1/pipeline/undeployPOSTRemove distributed model
/api/v1/pipeline/rebalancePOSTRebalance layers across workers
NAT Tunnel
/api/v1/tunnel/connectPOSTWorker registers tunnel connection
/api/v1/tunnel/:id/pollGETWorker long-polls for inference requests (30s timeout)
/api/v1/tunnel/:id/forwardPOSTRouter sends request to tunneled worker
/api/v1/tunnel/:id/response/:req_idPOSTWorker returns inference result
/api/v1/tunnel/:id/disconnectPOSTWorker disconnects tunnel
/api/v1/tunnel/workersGETList all tunneled workers

Tunnel Flow

NAT Tunnel: Worker Long-Polls Edge for Inference Requests Worker (behind NAT) RTX 3090, home office Edge :443 TLS, request queue Router :8881 AI load balancing 1. poll (outbound) 2. forward request 3. deliver via poll 1. Worker long-polls Edge (GET /tunnel/:id/poll) — blocks 30s, outbound only 2. Router sends inference request to Edge (POST /tunnel/:id/forward) 3. Edge queues request, poll returns it to worker — worker processes locally, responds

Any Worker Type, Any Location

All Eldric worker types support registration through the Edge. The tunnel is for inference workers that need to receive chat requests from behind NAT.

Worker Type Via Edge Registration NAT Tunnel Use Case
Inference Worker (:8890)SupportedSupportedRemote GPU contributing to cluster inference
Science Worker (:8897)SupportedNot neededLab-specific science APIs registered cluster-wide
Training Worker (:8898)SupportedNot neededRemote GPU for distributed model training
Data Worker (:8892)SupportedNot neededRemote NFS/RAG store accessible to cluster
Media Worker (:8894)SupportedNot neededRemote STT/TTS processing
Router (:8881)SupportedNot neededRemote router syncs worker list from controller