v4.2.0 Feature

Storage Clustering

Distributed storage across multiple data workers with tiered replication, capacity-weighted placement, SSD/HDD awareness, and automatic master failover — enterprise-grade data resilience for your AI infrastructure.

3 Replication Tiers
Critical / Important / Rebuildable
Capacity-Weighted
Heterogeneous Disks
Master Failover
Automatic Promotion
SSD + HDD Aware
Tiered Placement

Storage Cluster Architecture

Data workers report capacity via heartbeats. The master coordinates placement and replication across heterogeneous nodes.

Storage cluster with capacity-weighted placement and tiered replication Data Worker 1 (Master) 2.8 TB / 4 TB (70%) NVMe SSD Tier 1 sync replicas Tier 2 async replicas Heartbeat: 30s | Last: 2s ago ★ MASTER Data Worker 2 6.4 TB / 16 TB (40%) SATA HDD (RAID-6) Tier 2 async replicas Tier 3 rebuildable Heartbeat: 30s | Last: 5s ago Data Worker 3 4.8 TB / 8 TB (60%) NVMe SSD Tier 1 sync replicas Tier 2 async replicas Heartbeat: 30s | Last: 1s ago Tier 1 sync Tier 2 async cross-node replication Controller (8880) heartbeats + capacity reports

Master Storage Designation

One data worker is elected master. It coordinates placement decisions, replication, and serves as the authoritative source for metadata.

Master Election

The controller designates the master data worker based on availability, capacity, and disk type. SSD-backed workers are preferred for master status due to metadata I/O requirements. If the master becomes unreachable for three consecutive heartbeat intervals, the controller automatically promotes the next-best candidate.

Failover behavior: During master promotion, write operations are temporarily queued (up to 30 seconds). Read operations continue from any replica. Once the new master is confirmed, queued writes are flushed in order. No data is lost during failover.

Master Responsibilities

Metadata Authority

Maintains the authoritative file index, tenant quotas, replication factor map, and placement decisions. Replicas sync metadata from the master on each heartbeat cycle.

Placement Coordinator

Decides where new data lands based on capacity, disk type, replication tier, and network proximity. Uses weighted scoring to balance load across heterogeneous nodes.

Capacity-Weighted Placement

Heterogeneous storage nodes with different disk sizes and types are balanced using weighted placement algorithms.

Weighted Scoring

Each data worker receives a placement score based on: free capacity (40% weight), disk type match (30% weight), current I/O load (20% weight), and network latency to the requesting node (10% weight). New data is placed on the highest-scoring worker.

💾

SSD / HDD Awareness

Data workers report their disk type in heartbeats. Tier 1 critical data is preferentially placed on SSD-backed workers. Tier 3 rebuildable data (caches, intermediate results) is directed to HDD workers to preserve SSD capacity for hot data.

Rebalancing

When a new data worker joins the cluster or an existing one reports significantly changed capacity, the master triggers a background rebalance. Data migrates gradually to avoid I/O spikes, respecting a configurable bandwidth limit.

📊

Quota Enforcement

Per-tenant storage quotas are enforced at the master level. Write requests that would exceed a tenant's quota are rejected before placement occurs. Quotas are checked against the sum across all data workers, not per-node.

Tiered Replication

Not all data is equally important. Three tiers let you match replication cost to data criticality.

TierReplicationConsistencyData TypesWrite Latency
Tier 1 — Critical Synchronous, N+1 replicas Strong (all replicas ACK) User data, project files, model weights, training datasets, .emm matrices Higher (waits for all replicas)
Tier 2 — Important Asynchronous, N replicas Eventual (master ACK, replicas follow) Vector embeddings, conversation logs, session history, audit trails Low (master ACK only)
Tier 3 — Rebuildable No replication (single copy) N/A Caches, thumbnails, temporary files, intermediate computation results Lowest
Tier assignment is automatic. The storage cluster assigns tiers based on file path conventions: files under /users/ and /models/ are Tier 1, /vectors/ and /sessions/ are Tier 2, and /cache/ is Tier 3. Override per-file via the X-Storage-Tier header on upload.

Data Worker Heartbeats

Every data worker reports its full capacity status to the controller at a configurable interval (default: 30 seconds).

Capacity Report Payload

{ "worker_id": "dataw-abc123", "storage": { "total_bytes": 4000000000000, "used_bytes": 2800000000000, "free_bytes": 1200000000000, "disk_type": "ssd", "disk_model": "Samsung 990 Pro 4TB", "io_utilization": 0.23, "iops_current": 12400 }, "replication": { "tier1_files": 847, "tier2_files": 14230, "tier3_files": 5891, "pending_replications": 3 }, "is_master": true, "uptime_seconds": 864200 }

Health Monitoring

The controller tracks heartbeat timestamps for all data workers. If a worker misses three consecutive heartbeats (default: 90 seconds), it is marked as degraded. After five missed heartbeats, the controller initiates re-replication of Tier 1 data from surviving replicas.

StatusMissed HeartbeatsAction
Healthy0Normal operation
Degraded3Warnings, no new placements
Unreachable5Re-replicate Tier 1 data
Decommissioned10+Full data evacuation

Storage Cluster API

REST endpoints for monitoring and managing the storage cluster.

EndpointMethodDescription
/api/v1/storage/cluster/statusGETFull cluster status: workers, capacity, master, replication health
/api/v1/storage/cluster/workersGETList all data workers with capacity details
/api/v1/storage/cluster/masterGETCurrent master worker info
/api/v1/storage/cluster/master/promotePOSTManually promote a worker to master
/api/v1/storage/cluster/rebalancePOSTTrigger cluster rebalance
/api/v1/storage/cluster/replicationGETReplication status and pending operations
/api/v1/storage/cluster/configGET / PUTStorage cluster configuration
/api/v1/storage/tiersGETTier configuration and file counts
/api/v1/storage/tiers/{id}/configPUTUpdate tier replication settings
/api/v1/storage/quotasGET / POSTList / set tenant storage quotas
# Get storage cluster status curl http://controller:8880/api/v1/storage/cluster/status \ -H "X-API-Key: YOUR_KEY" # Response: { "cluster_healthy": true, "total_capacity_bytes": 28000000000000, "used_bytes": 14000000000000, "free_bytes": 14000000000000, "utilization_pct": 50.0, "master_id": "dataw-abc123", "worker_count": 3, "workers_healthy": 3, "workers_degraded": 0, "pending_replications": 0 }

Master Storage Directory Structure

The canonical directory layout on the master data worker. All paths are relative to the base data directory.

/data/eldric/ # Base data directory ├── users/ # Tier 1 - Per-user storage │ ├── {user_id}/ # User home directory │ │ ├── files/ # User-uploaded files │ │ ├── sessions/ # Conversation exports │ │ └── preferences.json # User settings │ └── ... ├── projects/ # Tier 1 - Project storage │ ├── {project_id}/ # Project root │ │ ├── files/ # Project documents and data │ │ ├── conversations/ # Project-scoped chat history │ │ ├── knowledge-bases/ # RAG data sources │ │ └── config.json # Project configuration │ └── ... ├── models/ # Tier 1 - Model storage │ ├── registry/ # Model registry metadata │ ├── weights/ # GGUF, LoRA adapters, checkpoints │ └── training-outputs/ # Training worker outputs ├── agents/ # Tier 1 - Agent configurations │ ├── registry/ # Agent registry metadata │ ├── custom/ # User-created agents │ └── generated/ # Agent Builder/Generator output ├── vectors/ # Tier 2 - Vector embeddings │ ├── {tenant_id}/ # Per-tenant namespaces │ │ └── {namespace_id}/ # Vector database files │ └── ... ├── memory/ # Tier 1 - Matrix memory (.emm) │ ├── personal/ # Per-user memory matrices │ ├── experiment/ # Project-level matrices │ └── collective/ # Cluster-wide matrices ├── sessions/ # Tier 2 - Conversation logs ├── media/ # Tier 2 - Media worker files ├── science/ # Tier 2 - Science worker data ├── cache/ # Tier 3 - Rebuildable caches │ ├── embeddings/ # Cached embedding results │ ├── thumbnails/ # Generated thumbnails │ └── tmp/ # Temporary computation files └── replication/ # Internal - replication metadata ├── journal/ # Replication WAL └── manifest.json # File-to-tier mapping

License Tiers

Storage clustering feature availability by license level.

FeatureFreeStandardProfessionalEnterprise
Data workers 1 2 5 Unlimited
Storage clustering
Tiered replication Tier 2 + 3 only All tiers All tiers
Synchronous replication (Tier 1)
Master failover
SSD/HDD-aware placement
Auto rebalancing
Cross-datacenter replication
Per-tenant quotas