Eldric Training Worker

AI model fine-tuning, LoRA adapters, RLHF alignment, training chains, and latent reasoning techniques. Train and customize models with production-grade infrastructure you fully control.

Port 8898

At a Glance

6
Training Backends
7
Training Methods
5
Latent Reasoning Techniques
Chains
Visual Pipeline Builder

Multi-Backend Training

Choose from Unsloth, Axolotl, TRL, DeepSpeed, MLX, or llama.cpp. Each backend optimized for different hardware and training scenarios.

Parameter-Efficient Methods

LoRA, QLoRA, SFT, DPO, RLHF, PPO, and full fine-tuning. From lightweight adapters to complete model retraining.

Training Chains

Visual node-based pipeline configuration. Chain data sources, AI generators, validators, and trainers into automated workflows.

🧠

Latent Reasoning

COCONUT, Quiet-STaR, Pause Tokens, Hidden CoT, and DeepSeek DSA. Train models to reason in latent space.

Architecture

Training Worker Architecture
Training Worker (Port 8898) Job Manager Chain Engine GPU Monitor Web Dashboard TRAINING BACKENDS Unsloth CUDA / 2x Speed Axolotl CUDA / ROCm TRL RLHF / DPO / PPO DeepSpeed Multi-GPU / Distributed MLX Apple Silicon llama.cpp GGUF / CPU+GPU INTEGRATIONS Controller (8880) Data Worker (8892) Python Venv

Training Backends

Six production-ready backends covering CUDA, ROCm, Apple Silicon, and CPU training. The Training Worker auto-detects available backends from the Python virtual environment.

Backend Description Hardware Methods Key Advantage
Unsloth Fast LoRA/QLoRA training CUDA LoRA QLoRA SFT 2x training speedup
Axolotl Flexible YAML-based training CUDA, ROCm LoRA QLoRA SFT Full YAML config, broad model support
TRL HuggingFace RLHF library CUDA SFT DPO RLHF PPO Full alignment pipeline
DeepSpeed Distributed training CUDA, multi-GPU SFT Full LoRA Multi-node, ZeRO optimization
MLX Apple Silicon training Apple MPS LoRA SFT Native macOS, unified memory
llama.cpp GGUF-based training CUDA, CPU, MPS LoRA SFT GGUF format, low resource

Training Methods

LoRA

Low-Rank Adaptation. Inserts small trainable matrices into frozen model layers. Efficient, fast, and produces lightweight adapters.

  • Rank 8-64 (configurable)
  • Alpha scaling parameter
  • Target specific layers
  • Adapter merging support

QLoRA

Quantized LoRA. Combines 4-bit quantization with LoRA for memory-efficient training on consumer GPUs.

  • 4-bit NormalFloat quantization
  • Double quantization
  • Paged optimizers
  • 70B models on single GPU

SFT

Supervised Fine-Tuning. Standard instruction tuning on labeled data to teach models specific behaviors and formats.

  • Instruction/response pairs
  • Alpaca, ShareGPT, OpenAI formats
  • Multi-turn conversation support
  • Template customization

DPO

Direct Preference Optimization. Aligns models using human preference data without a separate reward model.

  • No reward model needed
  • Chosen/rejected pairs
  • Beta parameter tuning
  • Simpler than RLHF

RLHF

Reinforcement Learning from Human Feedback. Full alignment pipeline with reward modeling and policy optimization.

  • Reward model training
  • PPO policy optimization
  • KL divergence constraint
  • Human preference alignment

PPO

Proximal Policy Optimization. The RL algorithm used in RLHF for stable policy updates during alignment training.

  • Clipped objective function
  • Value function estimation
  • GAE advantage estimation
  • Multiple training epochs per batch

Full Fine-Tune

Update all model parameters. Maximum capability but requires substantial compute and memory resources.

  • All parameters trainable
  • Maximum model adaptation
  • Requires multi-GPU / DeepSpeed
  • Gradient checkpointing support

Latent Reasoning Techniques

Beyond Chain-of-Thought

Latent reasoning techniques train models to reason in continuous latent space rather than producing verbose text-based reasoning chains. This enables faster, more efficient inference while maintaining or improving reasoning quality.

COCONUT

Chain of Continuous Thought. Replaces discrete token reasoning with continuous latent representations. The model learns to reason entirely in embedding space.

  • Reasoning in latent space
  • No intermediate text tokens
  • Faster inference throughput
  • Breadth-first reasoning patterns

Quiet-STaR

Self-Taught Reasoner. Trains models to generate internal thought tokens before each response token, creating implicit chain-of-thought reasoning.

  • Internal thought tokens
  • Self-teaching mechanism
  • REINFORCE with baselines
  • Generalized from STaR

Pause Tokens

Inserts learnable <pause> tokens before the model generates its answer, giving the transformer extra computation steps for complex reasoning.

  • Thinking pause before answers
  • Learnable delay tokens
  • Configurable pause length
  • Minimal architecture changes
🔍

Hidden CoT

Distills explicit chain-of-thought reasoning into hidden representations. Models learn to compress verbose reasoning into dense internal states.

  • Distill CoT into hidden layers
  • Compressed reasoning traces
  • No output overhead
  • Teacher-student distillation

DeepSeek DSA

Dynamic Sparse Attention. Enables efficient reasoning by dynamically selecting which attention heads participate in each computation step.

  • Dynamic attention selection
  • Sparse activation patterns
  • Reduced computation cost
  • Efficient long-context reasoning

Training Chains & Pipelines

Visual Node-Based Configuration

Training chains allow you to compose multi-step pipelines that connect data sources, AI-powered data generators, quality validators, and training backends into automated workflows. Create chains via the API or the web dashboard.

Example Training Chain Pipeline
Data Source /data/documents AI Generator llama3.2 / 1000 QA Validator Quality filter Trainer Unsloth / LoRA Model Output LoRA adapter

Chain Node Types

data_source

Load training data from local files, Data Worker storage, or remote URLs. Supports JSONL, Alpaca, ShareGPT, and OpenAI formats.

ai_generator

Use an LLM to generate synthetic training data from source documents. Configurable count, model, and generation template.

validator

Quality filters and deduplication. Remove low-quality samples, check format compliance, and validate against schemas.

trainer

Execute training with any supported backend and method. Full hyperparameter configuration and checkpoint management.

evaluator

Run benchmark evaluations on trained models. Compare against baselines and generate quality reports.

deployer

Automatically deploy trained models or adapters to inference workers in the Eldric cluster.

Chain Templates

Pre-built chain templates for common training workflows, available via /api/v1/chains/templates.

Template Description Nodes
qa-pipeline Generate Q&A pairs from documents and train a model data_source → ai_generator → trainer
alignment-pipeline SFT followed by DPO alignment data_source → trainer(SFT) → trainer(DPO)
rag-tuning Fine-tune on knowledge base with RAG validation data_source → ai_generator → validator → trainer
code-tuning Train coding capabilities from repository data data_source → ai_generator → validator → trainer → evaluator

API Endpoints

Dashboard

GET /health

Health check and worker status

GET /dashboard

Web dashboard overview

GET /dashboard/jobs

Training jobs dashboard page

GET /dashboard/chains

Training chains dashboard page

GET /dashboard/backends

Backend status dashboard page

Training Jobs

GET /api/v1/jobs

List all training jobs

POST /api/v1/jobs

Create a new training job

GET /api/v1/jobs/{id}

Get training job details and progress

POST /api/v1/jobs/{id}/cancel

Cancel a running training job

POST /api/v1/jobs/{id}/pause

Pause a running training job

POST /api/v1/jobs/{id}/resume

Resume a paused training job

GET /api/v1/jobs/{id}/logs

Get training job log output

GET /api/v1/jobs/{id}/metrics

Get training metrics (loss, learning rate, etc.)

Training Chains

GET /api/v1/chains

List all training chains

POST /api/v1/chains

Create a new training chain

GET /api/v1/chains/{id}

Get chain configuration and status

DELETE /api/v1/chains/{id}

Delete a training chain

POST /api/v1/chains/{id}/run

Execute a training chain

GET /api/v1/chains/templates

Get available chain templates

System

GET /api/v1/backends

List available training backends and their status

GET /api/v1/gpus

Get GPU information, memory, and utilization

Quick Start

Start the Training Worker

# Start training worker with Data Worker integration ./eldric-traind --port 8898 \ --controller http://localhost:8880 \ --data-workers http://localhost:8892

Create a LoRA Training Job (MLX on macOS)

curl -X POST http://localhost:8898/api/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "name": "finetune-llama", "base_model": "mlx-community/Llama-3.2-3B-Instruct-4bit", "method": "lora", "backend": "mlx", "dataset": {"path": "/data/train.jsonl", "format": "alpaca"}, "hyperparams": {"epochs": 3, "batch_size": 4, "learning_rate": 2e-4}, "lora": {"rank": 16, "alpha": 32} }'

Create a DPO Alignment Job (TRL)

curl -X POST http://localhost:8898/api/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "name": "dpo-alignment", "base_model": "meta-llama/Llama-3.2-3B", "method": "dpo", "backend": "trl", "dataset": {"path": "/data/preferences.jsonl", "format": "openai"}, "hyperparams": {"epochs": 1, "batch_size": 2, "learning_rate": 5e-5} }'

Create a Training Chain

curl -X POST http://localhost:8898/api/v1/chains \ -H "Content-Type: application/json" \ -d '{ "name": "qa-pipeline", "nodes": [ {"type": "data_source", "config": {"path": "/data/docs"}}, {"type": "ai_generator", "config": {"model": "llama3.2", "count": 1000}}, {"type": "trainer", "config": {"backend": "unsloth", "method": "lora"}} ] }'

Monitor Training

# List all jobs curl http://localhost:8898/api/v1/jobs # Get job status and progress curl http://localhost:8898/api/v1/jobs/job-abc123 # Get training metrics (loss, learning rate) curl http://localhost:8898/api/v1/jobs/job-abc123/metrics # Get training logs curl http://localhost:8898/api/v1/jobs/job-abc123/logs # Cancel a job curl -X POST http://localhost:8898/api/v1/jobs/job-abc123/cancel # List available backends curl http://localhost:8898/api/v1/backends # Get GPU information curl http://localhost:8898/api/v1/gpus

Python Environment Setup

The Training Worker uses a Python virtual environment for all training backends. The venv is auto-detected at ~/.config/eldric/training-venv. Backend availability depends on which packages are installed.

# Create training venv (one-time setup) python3 -m venv ~/.config/eldric/training-venv source ~/.config/eldric/training-venv/bin/activate # Install TRL (HuggingFace training library - SFT, DPO, RLHF, PPO) pip install trl transformers peft accelerate datasets bitsandbytes # Install MLX for Apple Silicon (macOS only) pip install mlx mlx-lm # Install Axolotl (optional, for YAML-based training) pip install axolotl # Install Unsloth (CUDA required) pip install unsloth # Install DeepSpeed (CUDA required, multi-GPU) pip install deepspeed # Deactivate venv deactivate

Backend Package Requirements

Backend Required Packages Hardware
TRL trl, transformers, peft, accelerate CUDA
MLX mlx, mlx-lm macOS (Apple Silicon only)
Axolotl axolotl CUDA, ROCm
Unsloth unsloth CUDA (required)
DeepSpeed deepspeed CUDA (required)
llama.cpp System binary (llama-finetune) CUDA, CPU, MPS

Data Worker Integration

The Training Worker integrates with Eldric Data Workers for centralized dataset storage, model artifact management, and cluster-wide data access.

Dataset Storage

Store and retrieve training datasets from Data Workers. Supports multi-tenant isolation, versioning, and quota management.

  • Centralized dataset repository
  • Multi-tenant isolation
  • Dataset versioning
  • JSONL, Alpaca, ShareGPT, OpenAI formats

Model Artifact Storage

Save trained models, LoRA adapters, and checkpoints to Data Workers for cluster-wide distribution and deployment.

  • Automatic checkpoint upload
  • LoRA adapter storage
  • Model registry integration
  • Cluster-wide model sharing
# Start training worker with Data Worker integration ./eldric-traind --port 8898 \ --controller http://localhost:8880 \ --data-workers http://dataworker1:8892,http://dataworker2:8892 # Training jobs can reference Data Worker paths for datasets curl -X POST http://localhost:8898/api/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "name": "finetune-from-datad", "base_model": "meta-llama/Llama-3.2-3B", "method": "lora", "backend": "trl", "dataset": { "data_worker": "http://dataworker1:8892", "tenant_id": "default", "path": "/datasets/training/qa-pairs.jsonl", "format": "openai" } }'

GPU Monitoring

Real-time GPU monitoring for training job resource management. Supports NVIDIA CUDA GPUs and Apple Silicon unified memory.

Memory Tracking

Real-time VRAM usage monitoring per GPU and per training job. Alerts on memory pressure.

Utilization Metrics

GPU compute utilization, temperature, power draw, and clock speeds via nvidia-smi or system profiler.

Multi-GPU Support

Distribute training across multiple GPUs with DeepSpeed ZeRO stages or data parallelism.

# Query GPU information curl http://localhost:8898/api/v1/gpus # Response example (NVIDIA) { "gpus": [ { "index": 0, "name": "NVIDIA A100 80GB", "memory_total_mb": 81920, "memory_used_mb": 24576, "memory_free_mb": 57344, "utilization_gpu": 78, "utilization_memory": 30, "temperature_c": 62, "power_draw_w": 245 } ], "driver_version": "535.129.03", "cuda_version": "12.2" }

Web Dashboard

The Training Worker includes a built-in web dashboard at http://localhost:8898/dashboard for monitoring and managing training jobs, chains, and backends.

Jobs Dashboard

View all training jobs with real-time progress, loss curves, and resource utilization. Start, pause, resume, and cancel jobs from the browser.

/dashboard/jobs

Chains Dashboard

Visual pipeline editor for creating and managing training chains. View execution history and node-level status for each chain run.

/dashboard/chains

Backends Dashboard

Monitor installed training backends, GPU status, Python environment health, and backend-specific configuration options.

/dashboard/backends

License Limits

Training Worker capabilities are gated by license tier. The free tier provides MLX-based training for Apple Silicon users.

Feature Free Standard Professional Enterprise
Training backends MLX only Unsloth, TRL All All
Max epochs 3 10 Unlimited Unlimited
Max dataset size 1K samples 10K samples 100K samples Unlimited
Training chains
Latent reasoning
Multi-GPU
Distributed training
Data Worker integration
Training workers 1 2 5 Unlimited

Need More Capacity?

Contact license@core.at for custom licensing with unlimited training workers, backends, and dataset sizes. Enterprise licenses include priority support and distributed multi-node training.

Get Started

Download the Eldric distributed package and start fine-tuning models on your own infrastructure.

Download Eldric View Licensing