Eldric Training Worker

AI model fine-tuning, LoRA adapters, RLHF alignment, training chains, and latent reasoning techniques. Train and customize models with production-grade infrastructure you fully control.

Port 8898

At a Glance

Training Backends

Training Methods

Latent Reasoning Techniques

Chains

Visual Pipeline Builder

⚙

Multi-Backend Training

Choose from Unsloth, Axolotl, TRL, DeepSpeed, MLX, or llama.cpp. Each backend optimized for different hardware and training scenarios.

⭐

Parameter-Efficient Methods

LoRA, QLoRA, SFT, DPO, RLHF, PPO, and full fine-tuning. From lightweight adapters to complete model retraining.

⛓

Training Chains

Visual node-based pipeline configuration. Chain data sources, AI generators, validators, and trainers into automated workflows.

🧠

Latent Reasoning

COCONUT, Quiet-STaR, Pause Tokens, Hidden CoT, and DeepSeek DSA. Train models to reason in latent space.

Architecture

Training Worker Architecture

Training Backends

Six production-ready backends covering CUDA, ROCm, Apple Silicon, and CPU training. The Training Worker auto-detects available backends from the Python virtual environment.

Backend	Description	Hardware	Methods	Key Advantage
Unsloth	Fast LoRA/QLoRA training	CUDA	LoRA QLoRA SFT	2x training speedup
Axolotl	Flexible YAML-based training	CUDA, ROCm	LoRA QLoRA SFT Full	YAML config, broad model support
TRL	HuggingFace RLHF library	CUDA	SFT DPO RLHF PPO	Full alignment pipeline
DeepSpeed	Distributed training	CUDA, multi-GPU	SFT Full LoRA	Multi-node, ZeRO optimization
MLX	Apple Silicon training	Apple MPS	LoRA SFT	Native macOS, unified memory
llama.cpp	GGUF-based training	CUDA, CPU, MPS	LoRA SFT	GGUF format, low resource

Training Methods

LoRA

Low-Rank Adaptation. Inserts small trainable matrices into frozen model layers. Efficient, fast, and produces lightweight adapters.

Rank 8-64 (configurable)
Alpha scaling parameter
Target specific layers
Adapter merging support

QLoRA

Quantized LoRA. Combines 4-bit quantization with LoRA for memory-efficient training on consumer GPUs.

4-bit NormalFloat quantization
Double quantization
Paged optimizers
70B models on single GPU

SFT

Supervised Fine-Tuning. Standard instruction tuning on labeled data to teach models specific behaviors and formats.

Instruction/response pairs
Alpaca, ShareGPT, OpenAI formats
Multi-turn conversation support
Template customization

DPO

Direct Preference Optimization. Aligns models using human preference data without a separate reward model.

No reward model needed
Chosen/rejected pairs
Beta parameter tuning
Simpler than RLHF

RLHF

Reinforcement Learning from Human Feedback. Full alignment pipeline with reward modeling and policy optimization.

Reward model training
PPO policy optimization
KL divergence constraint
Human preference alignment

PPO

Proximal Policy Optimization. The RL algorithm used in RLHF for stable policy updates during alignment training.

Clipped objective function
Value function estimation
GAE advantage estimation
Multiple training epochs per batch

Full Fine-Tune

Update all model parameters. Maximum capability but requires substantial compute and memory resources.

All parameters trainable
Maximum model adaptation
Requires multi-GPU / DeepSpeed
Gradient checkpointing support

Latent Reasoning Techniques

Beyond Chain-of-Thought

Latent reasoning techniques train models to reason in continuous latent space rather than producing verbose text-based reasoning chains. This enables faster, more efficient inference while maintaining or improving reasoning quality.

♾

COCONUT

Chain of Continuous Thought. Replaces discrete token reasoning with continuous latent representations. The model learns to reason entirely in embedding space.

Reasoning in latent space
No intermediate text tokens
Faster inference throughput
Breadth-first reasoning patterns

⭐

Quiet-STaR

Self-Taught Reasoner. Trains models to generate internal thought tokens before each response token, creating implicit chain-of-thought reasoning.

Internal thought tokens
Self-teaching mechanism
REINFORCE with baselines
Generalized from STaR

⏸

Pause Tokens

Inserts learnable <pause> tokens before the model generates its answer, giving the transformer extra computation steps for complex reasoning.

Thinking pause before answers
Learnable delay tokens
Configurable pause length
Minimal architecture changes

🔍

Hidden CoT

Distills explicit chain-of-thought reasoning into hidden representations. Models learn to compress verbose reasoning into dense internal states.

Distill CoT into hidden layers
Compressed reasoning traces
No output overhead
Teacher-student distillation

⚡

DeepSeek DSA

Dynamic Sparse Attention. Enables efficient reasoning by dynamically selecting which attention heads participate in each computation step.

Dynamic attention selection
Sparse activation patterns
Reduced computation cost
Efficient long-context reasoning

Training Chains & Pipelines

Visual Node-Based Configuration

Training chains allow you to compose multi-step pipelines that connect data sources, AI-powered data generators, quality validators, and training backends into automated workflows. Create chains via the API or the web dashboard.

Example Training Chain Pipeline

Chain Node Types

data_source

Load training data from local files, Data Worker storage, or remote URLs. Supports JSONL, Alpaca, ShareGPT, and OpenAI formats.

ai_generator

Use an LLM to generate synthetic training data from source documents. Configurable count, model, and generation template.

validator

Quality filters and deduplication. Remove low-quality samples, check format compliance, and validate against schemas.

trainer

Execute training with any supported backend and method. Full hyperparameter configuration and checkpoint management.

evaluator

Run benchmark evaluations on trained models. Compare against baselines and generate quality reports.

deployer

Automatically deploy trained models or adapters to inference workers in the Eldric cluster.

Chain Templates

Pre-built chain templates for common training workflows, available via /api/v1/chains/templates.

Template	Description	Nodes
qa-pipeline	Generate Q&A pairs from documents and train a model	data_source → ai_generator → trainer
alignment-pipeline	SFT followed by DPO alignment	data_source → trainer(SFT) → trainer(DPO)
rag-tuning	Fine-tune on knowledge base with RAG validation	data_source → ai_generator → validator → trainer
code-tuning	Train coding capabilities from repository data	data_source → ai_generator → validator → trainer → evaluator

API Endpoints

Dashboard

GET /health

Health check and worker status

GET /dashboard

Web dashboard overview

GET /dashboard/jobs

Training jobs dashboard page

GET /dashboard/chains

Training chains dashboard page

GET /dashboard/backends

Backend status dashboard page

Training Jobs

GET /api/v1/jobs

List all training jobs

POST /api/v1/jobs

Create a new training job

GET /api/v1/jobs/{id}

Get training job details and progress

POST /api/v1/jobs/{id}/cancel

Cancel a running training job

POST /api/v1/jobs/{id}/pause

Pause a running training job

POST /api/v1/jobs/{id}/resume

Resume a paused training job

GET /api/v1/jobs/{id}/logs

Get training job log output

GET /api/v1/jobs/{id}/metrics

Get training metrics (loss, learning rate, etc.)

Training Chains

GET /api/v1/chains

List all training chains

POST /api/v1/chains

Create a new training chain

GET /api/v1/chains/{id}

Get chain configuration and status

DELETE /api/v1/chains/{id}

Delete a training chain

POST /api/v1/chains/{id}/run

Execute a training chain

GET /api/v1/chains/templates

Get available chain templates

System

GET /api/v1/backends

List available training backends and their status

GET /api/v1/gpus

Get GPU information, memory, and utilization

Quick Start

Start the Training Worker

# Start training worker with Data Worker integration
./eldric-traind --port 8898 \
  --controller http://localhost:8880 \
  --data-workers http://localhost:8892
            

Create a LoRA Training Job (MLX on macOS)

curl -X POST http://localhost:8898/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "name": "finetune-llama",
    "base_model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
    "method": "lora",
    "backend": "mlx",
    "dataset": {"path": "/data/train.jsonl", "format": "alpaca"},
    "hyperparams": {"epochs": 3, "batch_size": 4, "learning_rate": 2e-4},
    "lora": {"rank": 16, "alpha": 32}
  }'
            

Create a DPO Alignment Job (TRL)

curl -X POST http://localhost:8898/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "name": "dpo-alignment",
    "base_model": "meta-llama/Llama-3.2-3B",
    "method": "dpo",
    "backend": "trl",
    "dataset": {"path": "/data/preferences.jsonl", "format": "openai"},
    "hyperparams": {"epochs": 1, "batch_size": 2, "learning_rate": 5e-5}
  }'
            

Create a Training Chain

curl -X POST http://localhost:8898/api/v1/chains \
  -H "Content-Type: application/json" \
  -d '{
    "name": "qa-pipeline",
    "nodes": [
      {"type": "data_source", "config": {"path": "/data/docs"}},
      {"type": "ai_generator", "config": {"model": "llama3.2", "count": 1000}},
      {"type": "trainer", "config": {"backend": "unsloth", "method": "lora"}}
    ]
  }'
            

Monitor Training

# List all jobs
curl http://localhost:8898/api/v1/jobs

# Get job status and progress
curl http://localhost:8898/api/v1/jobs/job-abc123

# Get training metrics (loss, learning rate)
curl http://localhost:8898/api/v1/jobs/job-abc123/metrics

# Get training logs
curl http://localhost:8898/api/v1/jobs/job-abc123/logs

# Cancel a job
curl -X POST http://localhost:8898/api/v1/jobs/job-abc123/cancel

# List available backends
curl http://localhost:8898/api/v1/backends

# Get GPU information
curl http://localhost:8898/api/v1/gpus
            

Python Environment Setup

The Training Worker uses a Python virtual environment for all training backends. The venv is auto-detected at ~/.config/eldric/training-venv. Backend availability depends on which packages are installed.

# Create training venv (one-time setup)
python3 -m venv ~/.config/eldric/training-venv
source ~/.config/eldric/training-venv/bin/activate

# Install TRL (HuggingFace training library - SFT, DPO, RLHF, PPO)
pip install trl transformers peft accelerate datasets bitsandbytes

# Install MLX for Apple Silicon (macOS only)
pip install mlx mlx-lm

# Install Axolotl (optional, for YAML-based training)
pip install axolotl

# Install Unsloth (CUDA required)
pip install unsloth

# Install DeepSpeed (CUDA required, multi-GPU)
pip install deepspeed

# Deactivate venv
deactivate
            

Backend Package Requirements

Backend	Required Packages	Hardware
TRL	`trl`, `transformers`, `peft`, `accelerate`	CUDA
MLX	`mlx`, `mlx-lm`	macOS (Apple Silicon only)
Axolotl	`axolotl`	CUDA, ROCm
Unsloth	`unsloth`	CUDA (required)
DeepSpeed	`deepspeed`	CUDA (required)
llama.cpp	System binary (`llama-finetune`)	CUDA, CPU, MPS

Data Worker Integration

The Training Worker integrates with Eldric Data Workers for centralized dataset storage, model artifact management, and cluster-wide data access.

Dataset Storage

Store and retrieve training datasets from Data Workers. Supports multi-tenant isolation, versioning, and quota management.

Centralized dataset repository
Multi-tenant isolation
Dataset versioning
JSONL, Alpaca, ShareGPT, OpenAI formats

Model Artifact Storage

Save trained models, LoRA adapters, and checkpoints to Data Workers for cluster-wide distribution and deployment.

Automatic checkpoint upload
LoRA adapter storage
Model registry integration
Cluster-wide model sharing

# Start training worker with Data Worker integration
./eldric-traind --port 8898 \
  --controller http://localhost:8880 \
  --data-workers http://dataworker1:8892,http://dataworker2:8892

# Training jobs can reference Data Worker paths for datasets
curl -X POST http://localhost:8898/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "name": "finetune-from-datad",
    "base_model": "meta-llama/Llama-3.2-3B",
    "method": "lora",
    "backend": "trl",
    "dataset": {
      "data_worker": "http://dataworker1:8892",
      "tenant_id": "default",
      "path": "/datasets/training/qa-pairs.jsonl",
      "format": "openai"
    }
  }'
            

GPU Monitoring

Real-time GPU monitoring for training job resource management. Supports NVIDIA CUDA GPUs and Apple Silicon unified memory.

Memory Tracking

Real-time VRAM usage monitoring per GPU and per training job. Alerts on memory pressure.

Utilization Metrics

GPU compute utilization, temperature, power draw, and clock speeds via nvidia-smi or system profiler.

Multi-GPU Support

Distribute training across multiple GPUs with DeepSpeed ZeRO stages or data parallelism.

# Query GPU information
curl http://localhost:8898/api/v1/gpus

# Response example (NVIDIA)
{
  "gpus": [
    {
      "index": 0,
      "name": "NVIDIA A100 80GB",
      "memory_total_mb": 81920,
      "memory_used_mb": 24576,
      "memory_free_mb": 57344,
      "utilization_gpu": 78,
      "utilization_memory": 30,
      "temperature_c": 62,
      "power_draw_w": 245
    }
  ],
  "driver_version": "535.129.03",
  "cuda_version": "12.2"
}
            

Web Dashboard

The Training Worker includes a built-in web dashboard at http://localhost:8898/dashboard for monitoring and managing training jobs, chains, and backends.

Jobs Dashboard

View all training jobs with real-time progress, loss curves, and resource utilization. Start, pause, resume, and cancel jobs from the browser.

/dashboard/jobs

Chains Dashboard

Visual pipeline editor for creating and managing training chains. View execution history and node-level status for each chain run.

/dashboard/chains

Backends Dashboard

Monitor installed training backends, GPU status, Python environment health, and backend-specific configuration options.

/dashboard/backends

License Limits

Training Worker capabilities are gated by license tier. The free tier provides MLX-based training for Apple Silicon users.

Feature	Free	Standard	Professional	Enterprise
Training backends	MLX only	Unsloth, TRL	All	All
Max epochs	3	10	Unlimited	Unlimited
Max dataset size	1K samples	10K samples	100K samples	Unlimited
Training chains	✗	✓	✓	✓
Latent reasoning	✗	✗	✓	✓
Multi-GPU	✗	✗	✓	✓
Distributed training	✗	✗	✗	✓
Data Worker integration	✗	✓	✓	✓
Training workers	1	2	5	Unlimited

Need More Capacity?

Contact license@core.at for custom licensing with unlimited training workers, backends, and dataset sizes. Enterprise licenses include priority support and distributed multi-node training.

Get Started

Download the Eldric distributed package and start fine-tuning models on your own infrastructure.

Download Eldric View Licensing