HPC · use case

Put the AI where the compute already is

by Juergen Paulhart · 2026-04-24 · ~8 min read

“Europe has the supercomputers. What’s missing is the last-mile operating layer that lets a PhD student actually talk to the cluster without writing a SLURM job.”
Researcher chat.your-hpc.eu SSO via eduGAIN login nodes eldric-aios edge + router + controller identity · tenant = project ID HPC allocation quota data pods eldric-aios data role Lustre / GPFS / NFS mounted Matrix Memory .emm on campaign FS vector store on /scratch compute partition eldric-aios inference role Grace Hopper / H100 / A100 GGML_CUDA, tensor split, pipeline srun --gres=gpu:4 Grant / paper drafting chat over prior proposals per-project Matrix Memory Distillation pipelines Transformer → xLSTM Training Worker + SLURM Domain science chat 140+ scientific APIs per-domain memory sizes Cluster-aware agents sbatch from chat monitor + fetch results EuroHPC-shaped EU-hosted repo GDPR-shaped defaults open-source kernel

Europe’s supercomputers aren’t short on FLOPS. Leonardo at CINECA (>250 PFLOPS peak), LUMI at CSC, Marenostrum 5 at BSC, Jean Zay at IDRIS — all sit at the tier the research community dreams about. What they’re short on is the operating layer that turns a pile of Grace Hopper nodes into an assistant a chemist can actually use at 10pm on a Tuesday.

Eldric AI OS is that operating layer. It maps cleanly onto the shape of an HPC facility: login-node roles, data-pod roles, compute-partition roles. The same one binary (eldric-aios) runs everywhere; the role the node plays is a startup flag.

Value propositions

Topology-aware deployment

15 role modules. edge+router+controller on login nodes; data on the storage pods; inference on the compute partitions. Same binary, different flag.

Matrix Memory on the campaign filesystem

.emm v4 files live next to the project’s data. Per-domain sizing (particle_physics 512/1024, genomics 256/1024). Researchers get institutional recall that survives cohort turnover.

Distillation-pipeline native

Training Worker ships a 3-stage Transformer → xLSTM distillation pipeline. SLURM-friendly launch, MLX / Unsloth / TRL backends. Ideal for long-context research.

SSO friendly

Identity service accepts OAuth flows; Phase-4 adds SAML/OIDC for eduGAIN. Tenants map cleanly to HPC project IDs.

Sovereign posture by construction

Open-source kernel, signed EU-hosted repo, GDPR-shaped defaults, no outbound telemetry. Aligns with EuroHPC’s sovereign-AI agenda out of the box.

Chat-to-sbatch

Cluster-aware agents let researchers submit training jobs from the same chat they asked the literature question in. One mental model for the whole workflow.

AI-driven differentiator

The research community’s mental model of AI is still mostly “talk to an API”. Eldric offers a different frame: the AI lives where your data lives, on the compute you already have. That’s a better fit for HPC because HPC already won that architectural argument — data gravity wins against egress every time. Add xLSTM distillation (literally a short bus ride from Leonardo, via JKU Linz) and the sovereign-AI story writes itself.

Scalable use cases

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

Dev-cluster footprint

A single 3×4090 dev workstation runs the full stack end-to-end for local testing before submitting a SLURM-scale job on Leonardo. Same binary, same roles.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

ArtefactSizeNotes
eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm~1.4 MBCPU baseline binary; one RPM, one systemd unit.
eldric-aios-cuda add-on~512 MBPulled in automatically via Supplements: cuda-drivers on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF~4.9 GBGood default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF~40 GBThe sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF~80 GBTight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)~700 MBCPU or GPU. One per cluster; handles vector indexing.
Matrix Memory .emm per domain50–500 MBDepends on rank × dim (see memory article). chat 64/768 ~200 kB; particle_physics 512/1024 ~500 MB.
Vector store per 1M chunks~6–10 GBDepends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log~200 MB / 1M callsJSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

Pilot / teamDepartment / BUProduction / enterprise
CPU1× EPYC 7313 (16c) or i9-14900K2× EPYC 9354 (32c each)2× EPYC 9654 (96c) per node
GPU1× RTX 4090 (24 GB)3× RTX 4090 (72 GB)4× H100 (320 GB) or 8× H200
RAM128 GB DDR5256 GB DDR5 ECC1 TB DDR5 ECC per node
Storage2× 4 TB NVMe (RAID-1)6× 8 TB NVMe (RAID-10) + SSD cacheTiered: NVMe hot + TB-scale HDD / Lustre
Network1 GbE OK10 GbE with link agg25/100 GbE or IB-HDR for multi-node
Power~1 kW typical / 1.5 kW peak~2 kW typical / 3 kW peak4–6 kW per node
Hardware cost~€4–5k~€12–15k€80–250k per node
Serves8B model, 10–30 concurrent chat users70B Q4 at 60–80 tok/s, 200–500 usersMixtral / Llama-405B, 2k+ users per node

Network + ops footprint

SWOT — an honest read

Strengths

  • Role-modular architecture maps cleanly onto HPC topology
  • GGML_CUDA via embedded llama.cpp, tensor-split and pipeline parallelism ready
  • EU-hosted signed repo — satisfies sovereignty checkboxes
  • xLSTM distillation pipeline native to the Training Worker (Hochreiter lineage)

Weaknesses

  • Native SLURM / PBS integration still manual (users submit jobs) — Phase-4 roadmap
  • Grace Hopper-specific tuning profiles not shipped yet
  • Inter-node RDMA fabric hasn’t been tested across every EuroHPC site
  • Eldric operations team small vs. traditional HPC operators

Opportunities

  • EuroHPC Federation formally launched — sovereign-AI posture is in the ask
  • EU AI Act pushing research consortia toward on-prem model execution
  • EuroHPC JU funding instruments for AI infrastructure specifically
  • Leonardo + LUMI upgrade cycles opening windows for platform reconsideration

Threats

  • NVIDIA AI Enterprise bundled with hardware orders
  • Proprietary stacks (Slurm-as-a-Service, Weights&Biases enterprise) capturing mindshare
  • Hyperscaler HPC offerings (AWS HPC, GCP HPC) selling the managed-service angle
  • Centre-specific in-house AI platforms already launched

First entry points — concrete value in 30 / 90 / 180 days

30 days

Login-node pilot

Install alpha.3 on one login node. Wire it to a NFS mount with the centre’s documentation. Demo to an advisory committee.

90 days

Multi-role deployment

Split roles across login + data + one compute partition. One research group onboarded as tenant. Distillation demo on a small model.

180 days

Site-wide rollout

All active projects onboarded; allocation quota integrated with the existing accounting system. Sovereign-AI report published by the centre.

Install alpha.3 Science & experiments Memory article Universities use case office@eldric.ai
#HPC #EuroHPC #Leonardo #CINECA #LUMI #SovereignAI #GraceHopper #xLSTM