Pull, distribute, and manage AI models across your Eldric cluster
v4.1.0Eldric provides a unified model management system that lets you pull models from public registries, host your own custom models, and distribute them to every inference worker in your cluster — all from a single API or the controller dashboard.
Pull models directly from the Ollama Hub to all inference workers in parallel. Supports any model available in the Ollama registry — Llama, Qwen, Mistral, Gemma, DeepSeek, and thousands more.
Host your own models on the Data Worker. Upload GGUF, safetensors, or training output from the Training Worker and make them available across the cluster.
Distribute models from any source — registry, another worker, a URL, or Ollama — to specific workers or the entire cluster with a single API call.
The controller provides a "Pull to All Workers" feature that triggers an asynchronous model pull across every inference worker in the cluster. Each worker pulls directly from the Ollama registry, and the controller tracks progress per worker.
The Custom Model Registry is hosted on the Data Worker and stores models that you upload manually, export from the Training Worker, or download from external sources. Registered models can be distributed to inference workers on demand.
Quantized models for llama.cpp and Ollama. Efficient storage and fast inference on CPU and GPU.
Standard format for vLLM, TGI, and Triton backends. Full-precision or quantized weights.
LoRA adapters and merged models from the Training Worker are automatically registered here.
Models downloaded from HuggingFace, custom URLs, or transferred from other clusters.
The distribution system takes a model from any source and installs it on target workers, handling backend-specific installation automatically. If workers share NFS storage with the Data Worker, models are accessed via path rather than copied.
| Source | Description | Use When |
|---|---|---|
registry |
Fetch from the Custom Model Registry on the Data Worker | Distributing custom or fine-tuned models |
worker |
Copy from one worker to another | Replicating a model already on one node |
url |
Download from an external URL (HuggingFace, S3, etc.) | Pulling from external model hosting |
ollama |
Pull from the Ollama registry | Using standard Ollama models |
| Backend | Installation Method | Notes |
|---|---|---|
| Ollama | Generated Modelfile + ollama create |
Creates from GGUF, auto-generates template |
| vLLM | Copy to model directory | Safetensors or GGUF placed in serving path |
| llama.cpp | Copy to model directory | GGUF files placed in configured models path |
| Triton | Model repository + load API | Config.pbtxt generated, model loaded via API |
| TGI | Copy to model directory | Safetensors with tokenizer files |
When inference workers have NFS mounts from the Data Worker, the distribution system detects the shared filesystem and configures backends to read directly from the NFS path. This avoids redundant file copies and saves significant disk space and transfer time. Configure NFS mounts via the Data Worker NFS integration.
The Models tab in the controller dashboard at http://controller:8880/dashboard provides a visual interface for all model management operations.
Text input with a "Pull to All Workers" button. Enter any Ollama model name and pull it to every inference worker with one click. Shows real-time progress bars per worker.
Live view of active pull jobs with per-worker status indicators: pending, pulling (with percentage), completed, or failed. Historical jobs remain visible for reference.
Shows every model in the cluster, which workers have it, and coverage percentage. Quickly identify models that are only on some workers and distribute them with one click.
Browse uploaded and trained models. View metadata including format, size, source, and creation date. Distribute or delete models directly from the registry view.
Aggregated list of all models available via the cluster API. This is the unified view that clients see when they call GET /api/v1/models.
Drill down into any worker to see its local model list, sizes, modification dates, and backend type. Useful for debugging model availability issues.
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/models/pull |
Pull a model from Ollama to workers (async job) |
| GET | /api/v1/models/pull-jobs |
List all pull jobs (active and completed) |
| GET | /api/v1/models/pull-jobs/{id} |
Get status and per-worker progress for a pull job |
| GET | /api/v1/model-registry |
List all models in the custom registry |
| POST | /api/v1/model-registry/upload |
Register a new model in the custom registry |
| DELETE | /api/v1/model-registry/{id} |
Delete a model from the custom registry |
| POST | /api/v1/models/distribute |
Distribute a model to workers from any source |
| GET | /api/v1/models |
List all models aggregated across the cluster |
| POST | /api/v1/models/show |
Get model details (template, system prompt, parameters) |
Model pulls and distribution only target workers that actually serve inference. Non-inference workers are automatically skipped to avoid wasting bandwidth and storage.
| Worker Type | Port | Receives Models | Reason |
|---|---|---|---|
| Inference Worker | 8890 | Yes | Primary inference endpoint (Ollama, vLLM, llama.cpp, etc.) |
| Cloud Worker | 8889 | Yes | Cloud inference gateway with local model caching |
| Data Worker | 8892 | No | Hosts registry only; does not serve inference |
| Science Worker | 8897 | No | Scientific APIs; uses inference workers for LLM tasks |
| Training Worker | 8898 | No | Pulls base models independently for training |
| Media Worker | 8894 | No | Audio/video processing; uses separate STT/TTS models |
| Comm Worker | 8895 | No | Messaging protocols; uses inference workers for AI replies |
Get a model running across your entire cluster in three steps.
Initiate a cluster-wide pull from the Ollama registry.
Check the pull job until all workers report completion.
Confirm the model is available across the cluster.
| Component | Port | Protocol | Role in Model Management |
|---|---|---|---|
| Controller | 8880 | HTTP/REST | Pull orchestration, job tracking, distribution API |
| Inference Worker | 8890 | HTTP/REST | Model pull target, serves inference requests |
| Cloud Worker | 8889 | HTTP/REST | Cloud inference with model caching |
| Data Worker | 8892 | HTTP/REST + NFS | Custom model registry, NFS storage for models |
| Training Worker | 8898 | HTTP/REST | Produces fine-tuned models for the registry |