HPC · use case
Put the AI where the compute already is
“Europe has the supercomputers. What’s missing is the last-mile operating layer that lets a PhD student actually talk to the cluster without writing a SLURM job.”
Europe’s supercomputers aren’t short on FLOPS. Leonardo at CINECA (>250 PFLOPS peak), LUMI at CSC, Marenostrum 5 at BSC, Jean Zay at IDRIS — all sit at the tier the research community dreams about. What they’re short on is the operating layer that turns a pile of Grace Hopper nodes into an assistant a chemist can actually use at 10pm on a Tuesday.
Eldric AI OS is that operating layer. It maps cleanly onto
the shape of an HPC facility: login-node roles, data-pod
roles, compute-partition roles. The same one binary
(eldric-aios) runs everywhere; the role the
node plays is a startup flag.
Value propositions
Topology-aware deployment
15 role modules. edge+router+controller on login nodes; data on the storage pods; inference on the compute partitions. Same binary, different flag.
Matrix Memory on the campaign filesystem
.emm v4 files live next to the project’s data. Per-domain sizing (particle_physics 512/1024, genomics 256/1024). Researchers get institutional recall that survives cohort turnover.
Distillation-pipeline native
Training Worker ships a 3-stage Transformer → xLSTM distillation pipeline. SLURM-friendly launch, MLX / Unsloth / TRL backends. Ideal for long-context research.
SSO friendly
Identity service accepts OAuth flows; Phase-4 adds SAML/OIDC for eduGAIN. Tenants map cleanly to HPC project IDs.
Sovereign posture by construction
Open-source kernel, signed EU-hosted repo, GDPR-shaped defaults, no outbound telemetry. Aligns with EuroHPC’s sovereign-AI agenda out of the box.
Chat-to-sbatch
Cluster-aware agents let researchers submit training jobs from the same chat they asked the literature question in. One mental model for the whole workflow.
AI-driven differentiator
The research community’s mental model of AI is still mostly “talk to an API”. Eldric offers a different frame: the AI lives where your data lives, on the compute you already have. That’s a better fit for HPC because HPC already won that architectural argument — data gravity wins against egress every time. Add xLSTM distillation (literally a short bus ride from Leonardo, via JKU Linz) and the sovereign-AI story writes itself.
Scalable use cases
- Researcher self-service. A PhD student chats with the cluster about her data; reads prior proposals; submits the training job; monitors it from the same UI.
- Grant-proposal assistance. Chat reads the centre’s historical proposals archive, the PI’s prior work, the funder’s guidelines. Drafts sections with citations.
- Cross-domain memory. A condensed-matter theorist’s chat inherits the condensed-matter domain Matrix Memory. Institutional knowledge persists across hires.
- Teaching + training. Postdocs run courses using the same cluster their students’ coursework runs on. Per-course tenants; per-student scope within.
- Industrial-access contracts. EuroHPC’s industrial-access programmes get a turnkey confidential-AI surface to offer partner companies.
Runs on commodity hardware
Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:
3× RTX 4090 — sweet spot
72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.
Single RTX 4090 / 4080 — team scale
24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.
CPU-only — pilot scale
llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.
Scale up
Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.
Dev-cluster footprint
A single 3×4090 dev workstation runs the full stack end-to-end for local testing before submitting a SLURM-scale job on Leonardo. Same binary, same roles.
The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.
What the disk bill looks like
| Artefact | Size | Notes |
|---|---|---|
eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm | ~1.4 MB | CPU baseline binary; one RPM, one systemd unit. |
eldric-aios-cuda add-on | ~512 MB | Pulled in automatically via Supplements: cuda-drivers on GPU hosts. Contains GGML_CUDA llama.cpp. |
| Llama 3.1 8B Q4_K_M GGUF | ~4.9 GB | Good default for team-scale chat on a single 4090. |
| Llama 3.3 70B Q4_K_M GGUF | ~40 GB | The sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably. |
| Mixtral 8x22B Q4 GGUF | ~80 GB | Tight on 3×4090; comfortable on 4×4090 or 2×H100. |
| nomic-embed-text (embedding) | ~700 MB | CPU or GPU. One per cluster; handles vector indexing. |
Matrix Memory .emm per domain | 50–500 MB | Depends on rank × dim (see memory article). chat 64/768 ~200 kB; particle_physics 512/1024 ~500 MB. |
| Vector store per 1M chunks | ~6–10 GB | Depends on embedding dim. SQLite backend; FAISS optional. |
| Hash-chained audit log | ~200 MB / 1M calls | JSONL, append-only, rotation at 500 MB files by default. |
Three reference hardware setups
| Pilot / team | Department / BU | Production / enterprise | |
|---|---|---|---|
| CPU | 1× EPYC 7313 (16c) or i9-14900K | 2× EPYC 9354 (32c each) | 2× EPYC 9654 (96c) per node |
| GPU | 1× RTX 4090 (24 GB) | 3× RTX 4090 (72 GB) | 4× H100 (320 GB) or 8× H200 |
| RAM | 128 GB DDR5 | 256 GB DDR5 ECC | 1 TB DDR5 ECC per node |
| Storage | 2× 4 TB NVMe (RAID-1) | 6× 8 TB NVMe (RAID-10) + SSD cache | Tiered: NVMe hot + TB-scale HDD / Lustre |
| Network | 1 GbE OK | 10 GbE with link agg | 25/100 GbE or IB-HDR for multi-node |
| Power | ~1 kW typical / 1.5 kW peak | ~2 kW typical / 3 kW peak | 4–6 kW per node |
| Hardware cost | ~€4–5k | ~€12–15k | €80–250k per node |
| Serves | 8B model, 10–30 concurrent chat users | 70B Q4 at 60–80 tok/s, 200–500 users | Mixtral / Llama-405B, 2k+ users per node |
Network + ops footprint
- Ports. One outward port (443 at the edge). Internally: controller on 8880, data on 8892, inference on 8883, science on 8897, etc. — all behind the edge.
- Storage layout.
${ELDRIC_DATA_DIR}defaults to/data/eldricif writable, else/var/lib/eldric. Subdirs:models/,vectors/,memory/(matrix memory),storage/(file storage),agent/,edge/, and per-module dirs. - Backup. The audit log and
.emmfiles are the two artefacts that matter. Everything else regenerates. Snapshot the data dir nightly; off-site every week. - Updates.
dnf upgrade eldric-aios. Rollback isdnf downgrade. Zero vendor dance. - Ops team. A single systems engineer can run a pilot install. A team of two runs a department deployment. Production enterprise uses your existing Linux sysadmin rota.
SWOT — an honest read
Strengths
- Role-modular architecture maps cleanly onto HPC topology
- GGML_CUDA via embedded llama.cpp, tensor-split and pipeline parallelism ready
- EU-hosted signed repo — satisfies sovereignty checkboxes
- xLSTM distillation pipeline native to the Training Worker (Hochreiter lineage)
Weaknesses
- Native SLURM / PBS integration still manual (users submit jobs) — Phase-4 roadmap
- Grace Hopper-specific tuning profiles not shipped yet
- Inter-node RDMA fabric hasn’t been tested across every EuroHPC site
- Eldric operations team small vs. traditional HPC operators
Opportunities
- EuroHPC Federation formally launched — sovereign-AI posture is in the ask
- EU AI Act pushing research consortia toward on-prem model execution
- EuroHPC JU funding instruments for AI infrastructure specifically
- Leonardo + LUMI upgrade cycles opening windows for platform reconsideration
Threats
- NVIDIA AI Enterprise bundled with hardware orders
- Proprietary stacks (Slurm-as-a-Service, Weights&Biases enterprise) capturing mindshare
- Hyperscaler HPC offerings (AWS HPC, GCP HPC) selling the managed-service angle
- Centre-specific in-house AI platforms already launched
First entry points — concrete value in 30 / 90 / 180 days
Login-node pilot
Install alpha.3 on one login node. Wire it to a NFS mount with the centre’s documentation. Demo to an advisory committee.
Multi-role deployment
Split roles across login + data + one compute partition. One research group onboarded as tenant. Distillation demo on a small model.
Site-wide rollout
All active projects onboarded; allocation quota integrated with the existing accounting system. Sovereign-AI report published by the centre.