Banking · use case

AI that can read the mainframe without ever leaving the vault

by Juergen Paulhart · 2026-04-24 · ~8 min read

“We have forty years of transactions in Db2 on z/OS, a DORA deadline, and a board that reads every AI vendor’s data-handling clause with a pen in hand.”

Banking AI isn’t a model problem. The model is the easy part. The hard parts are: can the AI actually reach the systems of record, will the regulator accept what happens to the data, and can the audit committee reconstruct exactly what was retrieved when something goes wrong.

Eldric AI OS was built on the assumption that the answers are yes, yes, and yes — on-prem, deterministic, auditable. The shipped unixODBC layer means alpha.3 already talks to Db2 z/OS over DRDA; the hash-chained audit log is the evidence artefact DORA reviews ask for; the identity service gives you chinese walls that retail-support can’t accidentally cross into M&A.

Value propositions

Mainframe reach today

data.local routes to any unixODBC DSN via real SQLDriverConnect / SQLFetch calls. Drop IBM DSDriver and Db2 z/OS over DRDA (port 446) is a plugin in the sidebar. No two-year ingestion project.

Chinese walls as code

alpha.3 ships real users, tenants, projects, workgroups. Retail banking and investment banking are different tenants; a retrieval can’t cross boundaries by accident. Admins see the boundary; users don’t have to remember it.

DORA-shaped audit trail

Every retrieval, prompt, and privacy toggle lands in a hash-chained, tamper-evident log. Reconstruct the state of any session at any past point. Evidence for Art. 9 operational-resilience reviews by construction.

Zero customer-data egress

On-prem llama.cpp handles inference. Cloud backends are opt-in and admin-visible. PII never crosses the firewall unless policy explicitly allows it.

Single RPM, single systemd unit

Validation, change-control, DR rehearsal — all operate on one artefact. dnf upgrade is the upgrade; dnf downgrade is the rollback. No surprise.

Signed supply chain

4096-bit RSA-signed RPM on repo.eldric.ai (EU-hosted). GPG fingerprint verifiable on the Downloads page. No opaque binary from a vendor datacentre.

AI-driven differentiator

The marketable AI primitives for banking — retrieval from decades of transactions, pattern-level fraud recall, cross-session institutional memory — need a data-access layer that actually speaks DRDA and a memory primitive that isn’t context-window-bounded. Eldric ships both, in one process. That combination does not exist in the vendor SaaS AI market.

Scalable use cases

Customer-service agents. Grounded chat on the customer’s own transactions (core banking Db2 LUW), historical archive (z/OS Db2 via ODBC), and CRM notes on NFS. First-contact resolution without tab-switching.
KYC / AML reviews. Agent chat fans out across CRM, sanctions-list plugins (via extension), historical case archive (Matrix Memory). Citation trail lands in the hash-chain.
Treasury / ALM. Real-time position queries fan-out across Db2 LUW + risk datamart; historical liquidity patterns recalled from Matrix Memory; output cited back to tables.
Internal audit. Read-only tenant with access to audit log + all system surfaces. “Show me every retrieval that touched account X in Q3” becomes one chat query.
Relationship management. Per-RM memory of a client’s preferences; Matrix Memory compresses multi-year patterns. The archive that used to retire with the senior RM now survives.

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

Regional bank baseline

A two-rack install — 3×4090 GPU node + 2-node data tier with NFS + replicated Matrix Memory — covers a 500-seat retail bank comfortably. No hyperscaler dependency.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

Artefact	Size	Notes
`eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm`	~1.4 MB	CPU baseline binary; one RPM, one systemd unit.
`eldric-aios-cuda` add-on	~512 MB	Pulled in automatically via `Supplements: cuda-drivers` on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF	~4.9 GB	Good default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF	~40 GB	The sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF	~80 GB	Tight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)	~700 MB	CPU or GPU. One per cluster; handles vector indexing.
Matrix Memory `.emm` per domain	50–500 MB	Depends on rank × dim (see memory article). `chat` 64/768 ~200 kB; `particle_physics` 512/1024 ~500 MB.
Vector store per 1M chunks	~6–10 GB	Depends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log	~200 MB / 1M calls	JSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

	Pilot / team	Department / BU	Production / enterprise
CPU	1× EPYC 7313 (16c) or i9-14900K	2× EPYC 9354 (32c each)	2× EPYC 9654 (96c) per node
GPU	1× RTX 4090 (24 GB)	3× RTX 4090 (72 GB)	4× H100 (320 GB) or 8× H200
RAM	128 GB DDR5	256 GB DDR5 ECC	1 TB DDR5 ECC per node
Storage	2× 4 TB NVMe (RAID-1)	6× 8 TB NVMe (RAID-10) + SSD cache	Tiered: NVMe hot + TB-scale HDD / Lustre
Network	1 GbE OK	10 GbE with link agg	25/100 GbE or IB-HDR for multi-node
Power	~1 kW typical / 1.5 kW peak	~2 kW typical / 3 kW peak	4–6 kW per node
Hardware cost	~€4–5k	~€12–15k	€80–250k per node
Serves	8B model, 10–30 concurrent chat users	70B Q4 at 60–80 tok/s, 200–500 users	Mixtral / Llama-405B, 2k+ users per node

Network + ops footprint

Ports. One outward port (443 at the edge). Internally: controller on 8880, data on 8892, inference on 8883, science on 8897, etc. — all behind the edge.
Storage layout. ${ELDRIC_DATA_DIR} defaults to /data/eldric if writable, else /var/lib/eldric. Subdirs: models/, vectors/, memory/ (matrix memory), storage/ (file storage), agent/, edge/, and per-module dirs.
Backup. The audit log and .emm files are the two artefacts that matter. Everything else regenerates. Snapshot the data dir nightly; off-site every week.
Updates. dnf upgrade eldric-aios. Rollback is dnf downgrade. Zero vendor dance.
Ops team. A single systems engineer can run a pilot install. A team of two runs a department deployment. Production enterprise uses your existing Linux sysadmin rota.

SWOT — an honest read

Strengths

Db2 z/OS reachable today via the shipped ODBC layer — alpha.3 is already mainframe-capable
Hash-chained audit log, identity service, multi-tenant all shipped in alpha.3
Runs on 3×4090 commodity hardware for most BU deployments
Open source under BSL (moving to OSS), signed EU-hosted repo

Weaknesses

Native DB2 CLI (DRDA without unixODBC indirection) still phase-2
IBM MQ, VSAM, and CICS native connectors still phase-2 — ODBC reach covers query, not messaging
SOC-2 / ISO-27001 not yet held; working toward them
Banking-specific ontology extensions (SEPA, SWIFT MX) are customer-built for now

Opportunities

DORA (Jan 2025) and EU AI Act force auditable on-prem AI for Tier-1 systems
Basel IV model-risk management requires reconstructable AI decisions
Sovereign-cloud mandates from national regulators (BaFin, FMA, AMF, ACPR)
Legacy-system AI is an underserved niche — hyperscalers can’t reach Db2 z/OS without bank-built bridges

Threats

IBM watsonx positioned for the same banking mainframe niche
Hyperscaler “private tenant” offerings blurring the on-prem message
Regional banks outsourcing core banking to the same hyperscalers, removing the on-prem requirement
Internal “build vs buy” teams underestimating the cost of re-implementing tenant + audit + retrieval

First entry points — concrete value in 30 / 90 / 180 days

30 days

Sandbox pilot

Single RPM on a non-production VM. ODBC DSN pointed at Db2 LUW (free driver, zero licensing). One RM power-user onboarded. Demo: chat cites a real 2019 transaction.

90 days

First regulated workload

Stand up a customer-service tenant. Wire Db2 z/OS via IBM DSDriver. Hash-chained audit feeds into the bank’s SIEM. 20 seats in retail support.

180 days

Multi-BU rollout

Retail + CRM + KYC tenants, each with workgroup-level chinese walls. DORA evidence package generated monthly from the audit log. DR rehearsed.

Install alpha.3 Privacy-first Data access article Insurance use case office@eldric.ai

#BankingAI #MainframeAI #ODBC #zOS #DORA #EUAIAct #OnPrem #AuditTrail