The secure external gateway for your Eldric cluster. TLS termination, API key authentication, rate limiting, plugin extensibility, and an embedded web chat client -- all in a single binary.
The Edge Server is the external-facing gateway for the entire Eldric distributed cluster. It sits at the perimeter, accepting HTTPS connections from external clients (OpenWebUI, custom apps, mobile clients), performing TLS termination, authenticating requests via API keys, enforcing rate limits, and proxying traffic to downstream routers and workers. It also hosts an embedded web chat UI and a plugin system for server-side and client-side extensibility.
HTTPS on port 443 with manual certificate configuration or automatic Let's Encrypt support. All downstream traffic stays on the internal network.
Per-key authentication with named clients, individual rate limits, and client registration endpoints for self-service key provisioning.
Configurable rate limits at three levels: global cluster RPM, per-IP RPM, and per-API-key RPM with sliding time windows.
Extend the Edge with server-side Python plugins (Tools, Filters, Pipes) and client-side JavaScript plugins (Actions, Widgets).
Horizontal scaling with multiple Edge instances. Peer synchronization keeps API keys and rate limit counters consistent across the farm.
Built-in browser-based chat UI at /chat with model selection, streaming responses, and conversation history. No external client required.
The Edge Server acts as the single entry point for all external traffic, forwarding authenticated requests to the internal cluster infrastructure.
The Edge Server handles HTTPS termination so that all internal cluster traffic can remain unencrypted on the private network. Configure TLS with manual certificates or integrate with Let's Encrypt for automatic renewal.
--no-tls for testing or behind a reverse proxyProtect your cluster with API key authentication. Each key is associated with a named client and can have individual rate limits. Keys are passed via the Authorization: Bearer sk-... header or the X-API-Key header.
Register new clients programmatically via the API:
Three-tier rate limiting protects your cluster from overload. Limits are enforced using sliding time windows and can be configured globally, per IP address, and per API key.
Maximum requests per minute across all clients combined. Protects the entire cluster from saturation.
Maximum requests per minute from a single IP address. Prevents individual clients from monopolizing resources.
Maximum requests per minute for a specific API key. Allows differentiated service levels per client.
When a rate limit is exceeded, the Edge returns HTTP 429 Too Many Requests with a Retry-After header indicating when the client can retry.
Deploy multiple Edge Server instances for high availability and horizontal scaling. Farm mode synchronizes API keys, rate limit counters, and upstream health status across all peers.
The Edge Server includes a fully functional browser-based chat interface, eliminating the need for external clients like OpenWebUI for basic usage. Access the chat UI directly at /chat.
Key implementation files: cpp/include/distributed/edge/edge_webclient.h and cpp/src/distributed/edge/edge_webclient.cpp.
Extend the Edge Server with plugins for custom tools, request/response filtering, virtual model backends, and client-side UI widgets. Plugins support both server-side Python execution and client-side JavaScript.
Server-side tools callable by the LLM during inference. Executed as Python subprocesses communicating via JSON-RPC. Ideal for integrating external APIs and databases.
Pre- and post-LLM message processing. Inlet filters run before the request reaches the LLM; Outlet filters process the response before it is returned to the client.
Virtual models powered by custom backends. A Pipe plugin registers as a model that appears in the model list and routes requests to an arbitrary processing pipeline.
Client-side UI extensions served as JavaScript to the browser. Actions add interactive buttons; Widgets embed custom UI components into the web chat interface.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/plugins |
GET | List all installed plugins with status |
/api/v1/plugins/{id}/enable |
POST | Enable a plugin |
/api/v1/plugins/{id}/disable |
POST | Disable a plugin |
/api/v1/plugins/{id}/valves |
GET | Get plugin configuration (valves) |
/api/v1/plugins/{id}/valves |
PUT | Update plugin configuration (valves) |
Key implementation files: cpp/include/distributed/edge/edge_plugin_host.h and cpp/src/distributed/edge/edge_plugin_host.cpp.
The Edge Server provides zero-copy SSE (Server-Sent Events) proxy for real-time token streaming. Streaming flows through the full distributed stack with minimal latency overhead.
/v1/chat/completions with stream: trueThe Edge Server can be configured via command-line flags or a JSON configuration file. The config file provides the full set of options.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check (returns upstream router status) |
/metrics |
GET | Server metrics (requests, latency, rate limit stats) |
/v1/chat/completions |
POST | OpenAI-compatible chat (proxied to router/worker) |
/v1/models |
GET | List available models (proxied to router/worker) |
/api/v1/clients/register |
POST | Register a new API client |
/api/v1/clients |
GET | List registered clients |
/api/v1/upstreams |
GET | Upstream router health and status |
/chat |
GET | Embedded web chat client (browser UI) |
/login |
GET | Authentication page (when API key auth is enabled) |
/api/v1/plugins |
GET | List installed plugins |
/api/v1/plugins/{id}/enable |
POST | Enable a plugin |
/api/v1/plugins/{id}/disable |
POST | Disable a plugin |
/api/v1/plugins/{id}/valves |
GET / PUT | Get or update plugin configuration |
The Edge Server is fully OpenAI API compatible, making it a drop-in replacement for any tool that supports the OpenAI endpoint format.
Configure OpenWebUI to connect through the Edge Server:
Get the Edge Server running in under a minute.
| File | Description |
|---|---|
cpp/src/distributed/edge/edge_server.cpp |
Edge server implementation (TLS, auth, rate limiting, proxy) |
cpp/src/distributed/edge/edge_main.cpp |
CLI entry point and argument parsing |
cpp/include/distributed/edge/edge_types.h |
Configuration types and data structures |
cpp/include/distributed/edge/edge_server.h |
EdgeServer class declaration |
cpp/include/distributed/edge/edge_webclient.h |
Embedded web chat UI (HTML/CSS/JS generation) |
cpp/src/distributed/edge/edge_webclient.cpp |
Web chat client implementation |
cpp/include/distributed/edge/edge_plugin_host.h |
Plugin system declaration |
cpp/src/distributed/edge/edge_plugin_host.cpp |
Plugin host implementation (load, execute, manage plugins) |
| Port | Protocol | Direction | Description |
|---|---|---|---|
| 443 | TCP | Inbound | HTTPS (external clients) |
| 80 | TCP | Inbound | HTTP (redirect to HTTPS or HTTP-only mode) |
8881 |
TCP | Outbound | Router connections (internal) |
8880 |
TCP | Outbound | Controller registration (internal) |
The Edge server proxies cluster management traffic, allowing workers anywhere on the internet to join the cluster. Workers behind NAT/firewalls use the built-in tunnel for receiving inference requests through outbound-only connections.
Workers with internet access register through the Edge TLS gateway. The Edge proxies registration, heartbeat, and pipeline management to the controller. API key authentication and rate limiting are enforced.
Workers behind NAT connect outbound to the Edge and long-poll for inference requests. No inbound ports, no VPN required. The worker processes requests locally and sends results back through the same outbound connection.
Sites connected via VPN or WireGuard access the controller directly on the private network. No Edge proxy needed. Standard deployment for corporate environments with existing network infrastructure.
These endpoints are proxied through the Edge to the Controller, enabling remote worker registration with TLS + API key auth.
| Edge Route | Method | Description |
|---|---|---|
| Worker Registration | ||
/api/v1/workers/register | POST | Register inference worker with controller |
/api/v1/workers/:id/heartbeat | POST | Worker heartbeat with metrics and status |
/api/v1/workers/discover | POST | Discover other workers in the cluster |
/api/v1/data-workers/register | POST | Register data worker |
/api/v1/science-workers/register | POST | Register science worker |
/api/v1/training-workers/register | POST | Register training worker |
| Pipeline Management | ||
/api/v1/pipeline/deploy | POST | Deploy distributed model across workers |
/api/v1/pipeline/models | GET | List distributed pipeline models |
/api/v1/pipeline/status | GET | Pipeline shard status |
/api/v1/pipeline/undeploy | POST | Remove distributed model |
/api/v1/pipeline/rebalance | POST | Rebalance layers across workers |
| NAT Tunnel | ||
/api/v1/tunnel/connect | POST | Worker registers tunnel connection |
/api/v1/tunnel/:id/poll | GET | Worker long-polls for inference requests (30s timeout) |
/api/v1/tunnel/:id/forward | POST | Router sends request to tunneled worker |
/api/v1/tunnel/:id/response/:req_id | POST | Worker returns inference result |
/api/v1/tunnel/:id/disconnect | POST | Worker disconnects tunnel |
/api/v1/tunnel/workers | GET | List all tunneled workers |
All Eldric worker types support registration through the Edge. The tunnel is for inference workers that need to receive chat requests from behind NAT.
| Worker Type | Via Edge Registration | NAT Tunnel | Use Case |
|---|---|---|---|
| Inference Worker (:8890) | Supported | Supported | Remote GPU contributing to cluster inference |
| Science Worker (:8897) | Supported | Not needed | Lab-specific science APIs registered cluster-wide |
| Training Worker (:8898) | Supported | Not needed | Remote GPU for distributed model training |
| Data Worker (:8892) | Supported | Not needed | Remote NFS/RAG store accessible to cluster |
| Media Worker (:8894) | Supported | Not needed | Remote STT/TTS processing |
| Router (:8881) | Supported | Not needed | Remote router syncs worker list from controller |