Storage Clustering

Storage Cluster Architecture

Data workers report capacity via heartbeats. The master coordinates placement and replication across heterogeneous nodes.

Master Storage Designation

One data worker is elected master. It coordinates placement decisions, replication, and serves as the authoritative source for metadata.

Master Election

The controller designates the master data worker based on availability, capacity, and disk type. SSD-backed workers are preferred for master status due to metadata I/O requirements. If the master becomes unreachable for three consecutive heartbeat intervals, the controller automatically promotes the next-best candidate.

⚠

Failover behavior: During master promotion, write operations are temporarily queued (up to 30 seconds). Read operations continue from any replica. Once the new master is confirmed, queued writes are flushed in order. No data is lost during failover.

Master Responsibilities

Metadata Authority

Maintains the authoritative file index, tenant quotas, replication factor map, and placement decisions. Replicas sync metadata from the master on each heartbeat cycle.

Placement Coordinator

Decides where new data lands based on capacity, disk type, replication tier, and network proximity. Uses weighted scoring to balance load across heterogeneous nodes.

Capacity-Weighted Placement

Heterogeneous storage nodes with different disk sizes and types are balanced using weighted placement algorithms.

⚖

Weighted Scoring

Each data worker receives a placement score based on: free capacity (40% weight), disk type match (30% weight), current I/O load (20% weight), and network latency to the requesting node (10% weight). New data is placed on the highest-scoring worker.

💾

SSD / HDD Awareness

Data workers report their disk type in heartbeats. Tier 1 critical data is preferentially placed on SSD-backed workers. Tier 3 rebuildable data (caches, intermediate results) is directed to HDD workers to preserve SSD capacity for hot data.

↔

Rebalancing

When a new data worker joins the cluster or an existing one reports significantly changed capacity, the master triggers a background rebalance. Data migrates gradually to avoid I/O spikes, respecting a configurable bandwidth limit.

📊

Quota Enforcement

Per-tenant storage quotas are enforced at the master level. Write requests that would exceed a tenant's quota are rejected before placement occurs. Quotas are checked against the sum across all data workers, not per-node.

Tiered Replication

Not all data is equally important. Three tiers let you match replication cost to data criticality.

Tier	Replication	Consistency	Data Types	Write Latency
Tier 1 — Critical	Synchronous, N+1 replicas	Strong (all replicas ACK)	User data, project files, model weights, training datasets, .emm matrices	Higher (waits for all replicas)
Tier 2 — Important	Asynchronous, N replicas	Eventual (master ACK, replicas follow)	Vector embeddings, conversation logs, session history, audit trails	Low (master ACK only)
Tier 3 — Rebuildable	No replication (single copy)	N/A	Caches, thumbnails, temporary files, intermediate computation results	Lowest

ℹ

Tier assignment is automatic. The storage cluster assigns tiers based on file path conventions: files under /users/ and /models/ are Tier 1, /vectors/ and /sessions/ are Tier 2, and /cache/ is Tier 3. Override per-file via the X-Storage-Tier header on upload.

Data Worker Heartbeats

Every data worker reports its full capacity status to the controller at a configurable interval (default: 30 seconds).

Capacity Report Payload

{
  "worker_id": "dataw-abc123",
  "storage": {
    "total_bytes": 4000000000000,
    "used_bytes": 2800000000000,
    "free_bytes": 1200000000000,
    "disk_type": "ssd",
    "disk_model": "Samsung 990 Pro 4TB",
    "io_utilization": 0.23,
    "iops_current": 12400
  },
  "replication": {
    "tier1_files": 847,
    "tier2_files": 14230,
    "tier3_files": 5891,
    "pending_replications": 3
  },
  "is_master": true,
  "uptime_seconds": 864200
}
                

Health Monitoring

The controller tracks heartbeat timestamps for all data workers. If a worker misses three consecutive heartbeats (default: 90 seconds), it is marked as degraded. After five missed heartbeats, the controller initiates re-replication of Tier 1 data from surviving replicas.

Status	Missed Heartbeats	Action
Healthy	0	Normal operation
Degraded	3	Warnings, no new placements
Unreachable	5	Re-replicate Tier 1 data
Decommissioned	10+	Full data evacuation

Storage Cluster API

REST endpoints for monitoring and managing the storage cluster.

Endpoint	Method	Description
/api/v1/storage/cluster/status	GET	Full cluster status: workers, capacity, master, replication health
/api/v1/storage/cluster/workers	GET	List all data workers with capacity details
/api/v1/storage/cluster/master	GET	Current master worker info
/api/v1/storage/cluster/master/promote	POST	Manually promote a worker to master
/api/v1/storage/cluster/rebalance	POST	Trigger cluster rebalance
/api/v1/storage/cluster/replication	GET	Replication status and pending operations
/api/v1/storage/cluster/config	GET / PUT	Storage cluster configuration
/api/v1/storage/tiers	GET	Tier configuration and file counts
/api/v1/storage/tiers/{id}/config	PUT	Update tier replication settings
/api/v1/storage/quotas	GET / POST	List / set tenant storage quotas

# Get storage cluster status
curl http://controller:8880/api/v1/storage/cluster/status \
  -H "X-API-Key: YOUR_KEY"

# Response:
{
  "cluster_healthy": true,
  "total_capacity_bytes": 28000000000000,
  "used_bytes": 14000000000000,
  "free_bytes": 14000000000000,
  "utilization_pct": 50.0,
  "master_id": "dataw-abc123",
  "worker_count": 3,
  "workers_healthy": 3,
  "workers_degraded": 0,
  "pending_replications": 0
}
        

Master Storage Directory Structure

The canonical directory layout on the master data worker. All paths are relative to the base data directory.

/data/eldric/                          # Base data directory
├── users/                             # Tier 1 - Per-user storage
│   ├── {user_id}/                     # User home directory
│   │   ├── files/                     # User-uploaded files
│   │   ├── sessions/                  # Conversation exports
│   │   └── preferences.json           # User settings
│   └── ...
├── projects/                          # Tier 1 - Project storage
│   ├── {project_id}/                  # Project root
│   │   ├── files/                     # Project documents and data
│   │   ├── conversations/             # Project-scoped chat history
│   │   ├── knowledge-bases/           # RAG data sources
│   │   └── config.json                # Project configuration
│   └── ...
├── models/                            # Tier 1 - Model storage
│   ├── registry/                      # Model registry metadata
│   ├── weights/                       # GGUF, LoRA adapters, checkpoints
│   └── training-outputs/              # Training worker outputs
├── agents/                            # Tier 1 - Agent configurations
│   ├── registry/                      # Agent registry metadata
│   ├── custom/                        # User-created agents
│   └── generated/                     # Agent Builder/Generator output
├── vectors/                           # Tier 2 - Vector embeddings
│   ├── {tenant_id}/                   # Per-tenant namespaces
│   │   └── {namespace_id}/            # Vector database files
│   └── ...
├── memory/                            # Tier 1 - Matrix memory (.emm)
│   ├── personal/                      # Per-user memory matrices
│   ├── experiment/                    # Project-level matrices
│   └── collective/                    # Cluster-wide matrices
├── sessions/                          # Tier 2 - Conversation logs
├── media/                             # Tier 2 - Media worker files
├── science/                           # Tier 2 - Science worker data
├── cache/                             # Tier 3 - Rebuildable caches
│   ├── embeddings/                    # Cached embedding results
│   ├── thumbnails/                    # Generated thumbnails
│   └── tmp/                           # Temporary computation files
└── replication/                       # Internal - replication metadata
    ├── journal/                       # Replication WAL
    └── manifest.json                  # File-to-tier mapping
        

License Tiers

Storage clustering feature availability by license level.

Feature	Free	Standard	Professional	Enterprise
Data workers	1	2	5	Unlimited
Storage clustering	—	✓	✓	✓
Tiered replication	—	Tier 2 + 3 only	All tiers	All tiers
Synchronous replication (Tier 1)	—	—	✓	✓
Master failover	—	—	✓	✓
SSD/HDD-aware placement	—	—	✓	✓
Auto rebalancing	—	—	—	✓
Cross-datacenter replication	—	—	—	✓
Per-tenant quotas	—	✓	✓	✓