Banking · use case

AI that can read the mainframe without ever leaving the vault

by Juergen Paulhart · 2026-04-24 · ~8 min read

“We have forty years of transactions in Db2 on z/OS, a DORA deadline, and a board that reads every AI vendor’s data-handling clause with a pen in hand.”
CORE SYSTEMS ELDRIC AI OS — one binary, inside the vault USERS Db2 LUW today · ODBC Db2 z/OS today · DRDA via IBM ODBC Policy NFS today · nfs-ganesha CRM · PostgreSQL today · ODBC IBM MQ phase-2 · libmqm dlopen eldric-aios 15 role modules · IntraBus · one process data module retrieval.data.local ODBC · NFS · SQL · vector + Matrix Memory edge + router identity · tenant · quota chinese-wall enforcement X-Eldric-Fanout header inference (llama.cpp) 3× RTX 4090 · 70B Q4 at 60–80 tok/s · CUDA on-prem · no egress hash-chained audit log every retrieval + prompt + privacy toggle · DORA / EU AI Act evidence Customer svc agent tenant: retail-support Risk analyst tenant: investment Compliance replays audit

Banking AI isn’t a model problem. The model is the easy part. The hard parts are: can the AI actually reach the systems of record, will the regulator accept what happens to the data, and can the audit committee reconstruct exactly what was retrieved when something goes wrong.

Eldric AI OS was built on the assumption that the answers are yes, yes, and yes — on-prem, deterministic, auditable. The shipped unixODBC layer means alpha.3 already talks to Db2 z/OS over DRDA; the hash-chained audit log is the evidence artefact DORA reviews ask for; the identity service gives you chinese walls that retail-support can’t accidentally cross into M&A.

Value propositions

Mainframe reach today

data.local routes to any unixODBC DSN via real SQLDriverConnect / SQLFetch calls. Drop IBM DSDriver and Db2 z/OS over DRDA (port 446) is a plugin in the sidebar. No two-year ingestion project.

Chinese walls as code

alpha.3 ships real users, tenants, projects, workgroups. Retail banking and investment banking are different tenants; a retrieval can’t cross boundaries by accident. Admins see the boundary; users don’t have to remember it.

DORA-shaped audit trail

Every retrieval, prompt, and privacy toggle lands in a hash-chained, tamper-evident log. Reconstruct the state of any session at any past point. Evidence for Art. 9 operational-resilience reviews by construction.

Zero customer-data egress

On-prem llama.cpp handles inference. Cloud backends are opt-in and admin-visible. PII never crosses the firewall unless policy explicitly allows it.

Single RPM, single systemd unit

Validation, change-control, DR rehearsal — all operate on one artefact. dnf upgrade is the upgrade; dnf downgrade is the rollback. No surprise.

Signed supply chain

4096-bit RSA-signed RPM on repo.eldric.ai (EU-hosted). GPG fingerprint verifiable on the Downloads page. No opaque binary from a vendor datacentre.

AI-driven differentiator

The marketable AI primitives for banking — retrieval from decades of transactions, pattern-level fraud recall, cross-session institutional memory — need a data-access layer that actually speaks DRDA and a memory primitive that isn’t context-window-bounded. Eldric ships both, in one process. That combination does not exist in the vendor SaaS AI market.

Scalable use cases

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

Regional bank baseline

A two-rack install — 3×4090 GPU node + 2-node data tier with NFS + replicated Matrix Memory — covers a 500-seat retail bank comfortably. No hyperscaler dependency.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

ArtefactSizeNotes
eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm~1.4 MBCPU baseline binary; one RPM, one systemd unit.
eldric-aios-cuda add-on~512 MBPulled in automatically via Supplements: cuda-drivers on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF~4.9 GBGood default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF~40 GBThe sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF~80 GBTight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)~700 MBCPU or GPU. One per cluster; handles vector indexing.
Matrix Memory .emm per domain50–500 MBDepends on rank × dim (see memory article). chat 64/768 ~200 kB; particle_physics 512/1024 ~500 MB.
Vector store per 1M chunks~6–10 GBDepends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log~200 MB / 1M callsJSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

Pilot / teamDepartment / BUProduction / enterprise
CPU1× EPYC 7313 (16c) or i9-14900K2× EPYC 9354 (32c each)2× EPYC 9654 (96c) per node
GPU1× RTX 4090 (24 GB)3× RTX 4090 (72 GB)4× H100 (320 GB) or 8× H200
RAM128 GB DDR5256 GB DDR5 ECC1 TB DDR5 ECC per node
Storage2× 4 TB NVMe (RAID-1)6× 8 TB NVMe (RAID-10) + SSD cacheTiered: NVMe hot + TB-scale HDD / Lustre
Network1 GbE OK10 GbE with link agg25/100 GbE or IB-HDR for multi-node
Power~1 kW typical / 1.5 kW peak~2 kW typical / 3 kW peak4–6 kW per node
Hardware cost~€4–5k~€12–15k€80–250k per node
Serves8B model, 10–30 concurrent chat users70B Q4 at 60–80 tok/s, 200–500 usersMixtral / Llama-405B, 2k+ users per node

Network + ops footprint

SWOT — an honest read

Strengths

  • Db2 z/OS reachable today via the shipped ODBC layer — alpha.3 is already mainframe-capable
  • Hash-chained audit log, identity service, multi-tenant all shipped in alpha.3
  • Runs on 3×4090 commodity hardware for most BU deployments
  • Open source under BSL (moving to OSS), signed EU-hosted repo

Weaknesses

  • Native DB2 CLI (DRDA without unixODBC indirection) still phase-2
  • IBM MQ, VSAM, and CICS native connectors still phase-2 — ODBC reach covers query, not messaging
  • SOC-2 / ISO-27001 not yet held; working toward them
  • Banking-specific ontology extensions (SEPA, SWIFT MX) are customer-built for now

Opportunities

  • DORA (Jan 2025) and EU AI Act force auditable on-prem AI for Tier-1 systems
  • Basel IV model-risk management requires reconstructable AI decisions
  • Sovereign-cloud mandates from national regulators (BaFin, FMA, AMF, ACPR)
  • Legacy-system AI is an underserved niche — hyperscalers can’t reach Db2 z/OS without bank-built bridges

Threats

  • IBM watsonx positioned for the same banking mainframe niche
  • Hyperscaler “private tenant” offerings blurring the on-prem message
  • Regional banks outsourcing core banking to the same hyperscalers, removing the on-prem requirement
  • Internal “build vs buy” teams underestimating the cost of re-implementing tenant + audit + retrieval

First entry points — concrete value in 30 / 90 / 180 days

30 days

Sandbox pilot

Single RPM on a non-production VM. ODBC DSN pointed at Db2 LUW (free driver, zero licensing). One RM power-user onboarded. Demo: chat cites a real 2019 transaction.

90 days

First regulated workload

Stand up a customer-service tenant. Wire Db2 z/OS via IBM DSDriver. Hash-chained audit feeds into the bank’s SIEM. 20 seats in retail support.

180 days

Multi-BU rollout

Retail + CRM + KYC tenants, each with workgroup-level chinese walls. DORA evidence package generated monthly from the audit log. DR rehearsed.

Install alpha.3 Privacy-first Data access article Insurance use case office@eldric.ai
#BankingAI #MainframeAI #ODBC #zOS #DORA #EUAIAct #OnPrem #AuditTrail