Model Management

Pull, distribute, and manage AI models across your Eldric cluster

v4.2.0

Overview

Eldric provides a unified model management system that lets you pull models from public registries, host your own custom models, and distribute them to every inference worker in your cluster — all from a single API or the controller dashboard.

Ollama Registry

Pull models directly from the Ollama Hub to all inference workers in parallel. Supports any model available in the Ollama registry — Llama, Qwen, Mistral, Gemma, DeepSeek, and thousands more.

Async pull with job tracking
Parallel download to all workers
Per-worker progress monitoring
Automatic model verification

Custom Model Registry

Host your own models on the Data Worker. Upload GGUF, safetensors, or training output from the Training Worker and make them available across the cluster.

GGUF and safetensors support
Training Worker integration
Version tracking and metadata
Multi-tenant isolation

Worker-to-Worker Distribution

Distribute models from any source — registry, another worker, a URL, or Ollama — to specific workers or the entire cluster with a single API call.

NFS path optimization (zero-copy)
Backend-specific installation
Coverage tracking per model
Selective or cluster-wide targeting

Architecture Flow

Model Distribution Architecture

Pulling Models from Ollama

The controller provides a "Pull to All Workers" feature that triggers an asynchronous model pull across every inference worker in the cluster. Each worker pulls directly from the Ollama registry, and the controller tracks progress per worker.

How It Works

1 API request to controller

→

2 Controller creates pull job

→

3 Parallel pull to all workers

→

4 Per-worker progress tracking

Async Pull Job Flow

Pull a Model to All Workers

# Pull a model to every inference worker in the cluster
curl -X POST http://controller:8880/api/v1/models/pull \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "all": true
  }'

# Response: a job ID for tracking
{
  "job_id": "pull-a1b2c3d4",
  "model": "llama3.2:3b",
  "status": "running",
  "workers": 3,
  "started_at": "2026-03-12T10:30:00Z"
}
            

Track Pull Progress

# Check the status of a pull job
curl http://controller:8880/api/v1/models/pull-jobs/pull-a1b2c3d4

# Response: per-worker status
{
  "job_id": "pull-a1b2c3d4",
  "model": "llama3.2:3b",
  "status": "running",
  "workers": {
    "wrk-worker1": { "status": "completed", "progress": 100 },
    "wrk-worker2": { "status": "pulling", "progress": 67 },
    "wrk-worker3": { "status": "pulling", "progress": 45 }
  }
}
            

List All Pull Jobs

# List all active and recent pull jobs
curl http://controller:8880/api/v1/models/pull-jobs

# Returns array of pull job statuses
            

Custom Model Registry

The Custom Model Registry is hosted on the Data Worker and stores models that you upload manually, export from the Training Worker, or download from external sources. Registered models can be distributed to inference workers on demand.

GGUF Models

Quantized models for llama.cpp and Ollama. Efficient storage and fast inference on CPU and GPU.

Safetensors

Standard format for vLLM, TGI, and Triton backends. Full-precision or quantized weights.

Training Output

LoRA adapters and merged models from the Training Worker are automatically registered here.

Imported Models

Models downloaded from HuggingFace, custom URLs, or transferred from other clusters.

Registry Upload & Distribution Flow

Register a Model

# Register a new model in the registry
curl -X POST http://controller:8880/api/v1/model-registry/upload \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-finetuned-llama",
    "format": "gguf",
    "size_bytes": 4200000000,
    "description": "Fine-tuned Llama 3.2 3B for customer support",
    "source": "training-worker",
    "metadata": {
      "base_model": "llama3.2:3b",
      "method": "lora",
      "epochs": 5
    }
  }'
            

Upload Model File

# Upload the model file to the Data Worker
curl -X POST http://dataworker:8892/api/v1/models/my-finetuned-llama/upload \
  -H "Content-Type: application/octet-stream" \
  --data-binary @my-model.gguf
            

List and Delete Registry Models

# List all registered models
curl http://controller:8880/api/v1/model-registry

# Delete a model from the registry
curl -X DELETE http://controller:8880/api/v1/model-registry/my-finetuned-llama
            

Model Distribution

The distribution system takes a model from any source and installs it on target workers, handling backend-specific installation automatically. If workers share NFS storage with the Data Worker, models are accessed via path rather than copied.

Distribution Sources

Source	Description	Use When
`registry`	Fetch from the Custom Model Registry on the Data Worker	Distributing custom or fine-tuned models
`worker`	Copy from one worker to another	Replicating a model already on one node
`url`	Download from an external URL (HuggingFace, S3, etc.)	Pulling from external model hosting
`ollama`	Pull from the Ollama registry	Using standard Ollama models

Distribute a Model

# Distribute a custom model from the registry to all workers
curl -X POST http://controller:8880/api/v1/models/distribute \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-finetuned-llama",
    "source": "registry",
    "target_workers": "all"
  }'

# Distribute to specific workers only
curl -X POST http://controller:8880/api/v1/models/distribute \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-finetuned-llama",
    "source": "registry",
    "target_workers": ["wrk-worker1", "wrk-worker2"]
  }'
            

Backend-Specific Installation

Backend	Installation Method	Notes
Ollama	Generated Modelfile + `ollama create`	Creates from GGUF, auto-generates template
vLLM	Copy to model directory	Safetensors or GGUF placed in serving path
llama.cpp	Copy to model directory	GGUF files placed in configured models path
Triton	Model repository + load API	Config.pbtxt generated, model loaded via API
TGI	Copy to model directory	Safetensors with tokenizer files

NFS Path Optimization

When inference workers have NFS mounts from the Data Worker, the distribution system detects the shared filesystem and configures backends to read directly from the NFS path. This avoids redundant file copies and saves significant disk space and transfer time. Configure NFS mounts via the Data Worker NFS integration.

Dashboard

The Models tab in the controller dashboard at http://controller:8880/dashboard provides a visual interface for all model management operations.

Download Model to Cluster

Text input with a "Pull to All Workers" button. Enter any Ollama model name and pull it to every inference worker with one click. Shows real-time progress bars per worker.

Pull Job Progress

Live view of active pull jobs with per-worker status indicators: pending, pulling (with percentage), completed, or failed. Historical jobs remain visible for reference.

Cluster Models View

Shows every model in the cluster, which workers have it, and coverage percentage. Quickly identify models that are only on some workers and distribute them with one click.

Custom Model Registry

Browse uploaded and trained models. View metadata including format, size, source, and creation date. Distribute or delete models directly from the registry view.

API Available Models

Aggregated list of all models available via the cluster API. This is the unified view that clients see when they call GET /api/v1/models.

Per-Worker Model Details

Drill down into any worker to see its local model list, sizes, modification dates, and backend type. Useful for debugging model availability issues.

API Endpoints

Method	Endpoint	Description
POST	`/api/v1/models/pull`	Pull a model from Ollama to workers (async job)
GET	`/api/v1/models/pull-jobs`	List all pull jobs (active and completed)
GET	`/api/v1/models/pull-jobs/{id}`	Get status and per-worker progress for a pull job
GET	`/api/v1/model-registry`	List all models in the custom registry
POST	`/api/v1/model-registry/upload`	Register a new model in the custom registry
DELETE	`/api/v1/model-registry/{id}`	Delete a model from the custom registry
POST	`/api/v1/models/distribute`	Distribute a model to workers from any source
GET	`/api/v1/models`	List all models aggregated across the cluster
POST	`/api/v1/models/show`	Get model details (template, system prompt, parameters)

Worker Type Targeting

Model pulls and distribution only target workers that actually serve inference. Non-inference workers are automatically skipped to avoid wasting bandwidth and storage.

Worker Type	Port	Receives Models	Reason
Inference Worker	8890	Yes	Primary inference endpoint (Ollama, vLLM, llama.cpp, etc.)
Cloud Worker	8889	Yes	Cloud inference gateway with local model caching
Data Worker	8892	No	Hosts registry only; does not serve inference
Science Worker	8897	No	Scientific APIs; uses inference workers for LLM tasks
Training Worker	8898	No	Pulls base models independently for training
Media Worker	8894	No	Audio/video processing; uses separate STT/TTS models
Comm Worker	8895	No	Messaging protocols; uses inference workers for AI replies

Quick Start

Get a model running across your entire cluster in three steps.

Step 1: Pull a Model

Initiate a cluster-wide pull from the Ollama registry.

curl -X POST http://controller:8880/api/v1/models/pull \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3:8b", "all": true}'

# Note the job_id in the response
                

Step 2: Monitor Progress

Check the pull job until all workers report completion.

curl http://controller:8880/api/v1/models/pull-jobs/pull-a1b2c3d4

# Wait until all workers show "status": "completed"

Step 3: Verify Coverage

Confirm the model is available across the cluster.

# List all models — check that qwen3:8b appears with 100% coverage
curl http://controller:8880/api/v1/models

# Or get model details
curl -X POST http://controller:8880/api/v1/models/show \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3:8b"}'
                

Related Components

Component	Port	Protocol	Role in Model Management
Controller	8880	HTTP/REST	Pull orchestration, job tracking, distribution API
Inference Worker	8890	HTTP/REST	Model pull target, serves inference requests
Cloud Worker	8889	HTTP/REST	Cloud inference with model caching
Data Worker	8892	HTTP/REST + NFS	Custom model registry, NFS storage for models
Training Worker	8898	HTTP/REST	Produces fine-tuned models for the registry