Distributed storage across multiple data workers with tiered replication, capacity-weighted placement, SSD/HDD awareness, and automatic master failover — enterprise-grade data resilience for your AI infrastructure.
Data workers report capacity via heartbeats. The master coordinates placement and replication across heterogeneous nodes.
One data worker is elected master. It coordinates placement decisions, replication, and serves as the authoritative source for metadata.
The controller designates the master data worker based on availability, capacity, and disk type. SSD-backed workers are preferred for master status due to metadata I/O requirements. If the master becomes unreachable for three consecutive heartbeat intervals, the controller automatically promotes the next-best candidate.
Maintains the authoritative file index, tenant quotas, replication factor map, and placement decisions. Replicas sync metadata from the master on each heartbeat cycle.
Decides where new data lands based on capacity, disk type, replication tier, and network proximity. Uses weighted scoring to balance load across heterogeneous nodes.
Heterogeneous storage nodes with different disk sizes and types are balanced using weighted placement algorithms.
Each data worker receives a placement score based on: free capacity (40% weight), disk type match (30% weight), current I/O load (20% weight), and network latency to the requesting node (10% weight). New data is placed on the highest-scoring worker.
Data workers report their disk type in heartbeats. Tier 1 critical data is preferentially placed on SSD-backed workers. Tier 3 rebuildable data (caches, intermediate results) is directed to HDD workers to preserve SSD capacity for hot data.
When a new data worker joins the cluster or an existing one reports significantly changed capacity, the master triggers a background rebalance. Data migrates gradually to avoid I/O spikes, respecting a configurable bandwidth limit.
Per-tenant storage quotas are enforced at the master level. Write requests that would exceed a tenant's quota are rejected before placement occurs. Quotas are checked against the sum across all data workers, not per-node.
Not all data is equally important. Three tiers let you match replication cost to data criticality.
| Tier | Replication | Consistency | Data Types | Write Latency |
|---|---|---|---|---|
| Tier 1 — Critical | Synchronous, N+1 replicas | Strong (all replicas ACK) | User data, project files, model weights, training datasets, .emm matrices | Higher (waits for all replicas) |
| Tier 2 — Important | Asynchronous, N replicas | Eventual (master ACK, replicas follow) | Vector embeddings, conversation logs, session history, audit trails | Low (master ACK only) |
| Tier 3 — Rebuildable | No replication (single copy) | N/A | Caches, thumbnails, temporary files, intermediate computation results | Lowest |
/users/ and /models/ are Tier 1, /vectors/ and /sessions/ are Tier 2, and /cache/ is Tier 3. Override per-file via the X-Storage-Tier header on upload.Every data worker reports its full capacity status to the controller at a configurable interval (default: 30 seconds).
The controller tracks heartbeat timestamps for all data workers. If a worker misses three consecutive heartbeats (default: 90 seconds), it is marked as degraded. After five missed heartbeats, the controller initiates re-replication of Tier 1 data from surviving replicas.
| Status | Missed Heartbeats | Action |
|---|---|---|
| Healthy | 0 | Normal operation |
| Degraded | 3 | Warnings, no new placements |
| Unreachable | 5 | Re-replicate Tier 1 data |
| Decommissioned | 10+ | Full data evacuation |
REST endpoints for monitoring and managing the storage cluster.
| Endpoint | Method | Description |
|---|---|---|
| /api/v1/storage/cluster/status | GET | Full cluster status: workers, capacity, master, replication health |
| /api/v1/storage/cluster/workers | GET | List all data workers with capacity details |
| /api/v1/storage/cluster/master | GET | Current master worker info |
| /api/v1/storage/cluster/master/promote | POST | Manually promote a worker to master |
| /api/v1/storage/cluster/rebalance | POST | Trigger cluster rebalance |
| /api/v1/storage/cluster/replication | GET | Replication status and pending operations |
| /api/v1/storage/cluster/config | GET / PUT | Storage cluster configuration |
| /api/v1/storage/tiers | GET | Tier configuration and file counts |
| /api/v1/storage/tiers/{id}/config | PUT | Update tier replication settings |
| /api/v1/storage/quotas | GET / POST | List / set tenant storage quotas |
The canonical directory layout on the master data worker. All paths are relative to the base data directory.
Storage clustering feature availability by license level.
| Feature | Free | Standard | Professional | Enterprise |
|---|---|---|---|---|
| Data workers | 1 | 2 | 5 | Unlimited |
| Storage clustering | — | ✓ | ✓ | ✓ |
| Tiered replication | — | Tier 2 + 3 only | All tiers | All tiers |
| Synchronous replication (Tier 1) | — | — | ✓ | ✓ |
| Master failover | — | — | ✓ | ✓ |
| SSD/HDD-aware placement | — | — | ✓ | ✓ |
| Auto rebalancing | — | — | — | ✓ |
| Cross-datacenter replication | — | — | — | ✓ |
| Per-tenant quotas | — | ✓ | ✓ | ✓ |