Insurance · use case

Claims, fraud, and the archive nobody has time to read

by Juergen Paulhart · 2026-04-24 · ~8 min read

“Claim volumes doubled in five years, adjuster headcount hasn’t. Meanwhile the institutional memory of every recognisable fraud pattern is in our senior investigators’ heads. Three of them retire next year.”
CLAIM BUNDLE RETRIEVAL FAN-OUT (PARALLEL) OUTCOMES claim.pdf narrative · forms photos.zip damage evidence police report incident # repair estimate workshop invoice policy terms data.pageindex eldric-aios data module Vector store exact retrieval embeddings of bundle + policy tree (pageindex) Matrix Memory v4 compressed fraud archive outer-product write, one matrix-vector recall per query llama.cpp · merged retrieval → drafted memo 3× RTX 4090 · 70B Q4 at 60–80 tok/s citations back to bundle + policy + archive hash-chained audit log Solvency II · GDPR · per-claim reconstructable Claim triage draft decision + reasons Fraud lead similar prior · score Policy check covered? · sublimits Customer letter draft + tone

Every mature insurer owns three things that AI should obviously help with and usually doesn’t: the growing claims pile, the historical fraud archive whose patterns live only in senior investigators’ heads, and the policy- language corpus that’s impenetrable on purpose. What blocks the obvious help is always the same — PII is regulated, the archive lives in the core, and nobody wants loss-adjustment deliberations leaking into a vendor’s logs.

Eldric AI OS is the on-prem answer. The Matrix Memory v4 Gated DeltaNet update rule compresses decades of claim outcomes into a dense associative store; vector retrieval anchors the answer to specific documents; the hash-chained audit log is the evidence Solvency II reviewers ask for. Everything stays inside the insurer’s network.

Value propositions

Claims triage draft in minutes

The data module indexes the claim bundle on ingest; the on-prem LLM drafts a triage memo with line-level citations back to the PDFs. A 3-hour first-day review becomes a 15-minute supervised read.

Fraud-pattern recall

Matrix Memory’s outer-product writes absorb decades of claim–outcome pairs. An incoming claim that resembles a known fraud vector returns a similarity score with a pointer back to the precedent. Not a decision, a lead.

Policy-language chat

data.pageindex (hierarchical tree reasoning, sketch in alpha.3) outperforms vector similarity on structured policy docs. Answers cite sublimits, exclusions, and endorsements back to the section they came from.

Per-claim audit reconstruction

Every retrieval and prompt is hash-chained. “Why did the AI say that?” is answerable at any future point. Solvency II model-governance evidence by construction.

Line-of-business tenants

Property, auto, health, life, re-insurance — each its own tenant. GDPR data minimisation by architecture: retrieval can’t cross the boundary.

No claim data egress

On-prem llama.cpp default. No vendor gets PII as training material. Your reinsurance treaty obligations don’t suddenly include an AI vendor’s sub-processor list.

AI-driven differentiator

The two hardest problems for insurer AI are structured document reasoning (policy wording) and cross-claim pattern recall (fraud). Eldric ships both as first-class architectural primitives, not as RAG shims. data.pageindex does tree-walking over policy documents; Matrix Memory v4 compresses decades of fraud outcomes into associative recall that’s independent of context-window length. Context-window LLMs can’t do either.

Scalable use cases

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

Mid-market insurer baseline

3×4090 handles claims triage + fraud recall + policy chat for 500 concurrent adjusters. Matrix Memory for fraud (256 rank, 1024 dim) fits in 500 MB.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

ArtefactSizeNotes
eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm~1.4 MBCPU baseline binary; one RPM, one systemd unit.
eldric-aios-cuda add-on~512 MBPulled in automatically via Supplements: cuda-drivers on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF~4.9 GBGood default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF~40 GBThe sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF~80 GBTight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)~700 MBCPU or GPU. One per cluster; handles vector indexing.
Matrix Memory .emm per domain50–500 MBDepends on rank × dim (see memory article). chat 64/768 ~200 kB; particle_physics 512/1024 ~500 MB.
Vector store per 1M chunks~6–10 GBDepends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log~200 MB / 1M callsJSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

Pilot / teamDepartment / BUProduction / enterprise
CPU1× EPYC 7313 (16c) or i9-14900K2× EPYC 9354 (32c each)2× EPYC 9654 (96c) per node
GPU1× RTX 4090 (24 GB)3× RTX 4090 (72 GB)4× H100 (320 GB) or 8× H200
RAM128 GB DDR5256 GB DDR5 ECC1 TB DDR5 ECC per node
Storage2× 4 TB NVMe (RAID-1)6× 8 TB NVMe (RAID-10) + SSD cacheTiered: NVMe hot + TB-scale HDD / Lustre
Network1 GbE OK10 GbE with link agg25/100 GbE or IB-HDR for multi-node
Power~1 kW typical / 1.5 kW peak~2 kW typical / 3 kW peak4–6 kW per node
Hardware cost~€4–5k~€12–15k€80–250k per node
Serves8B model, 10–30 concurrent chat users70B Q4 at 60–80 tok/s, 200–500 usersMixtral / Llama-405B, 2k+ users per node

Network + ops footprint

SWOT — an honest read

Strengths

  • Matrix Memory v4 compresses decades of fraud patterns into a single associative store
  • data.pageindex sketch for structured policy docs — already in the SDK
  • Hash-chained audit log + identity service + multi-tenant — Solvency II primitives by construction
  • Runs on commodity 3×4090; does not require a hyperscaler contract

Weaknesses

  • Actuarial tooling (reserving, capital models) not yet native — integrated via ODBC for now
  • PDF OCR uses external tools (Tesseract, pdfplumber); no built-in ICR engine
  • Claims-system-specific ontologies (Guidewire, Duck Creek, Sapiens) require customer extensions
  • Photo-damage model selection is customer’s choice — Eldric supplies the pipeline, not the vision model

Opportunities

  • Solvency II model-governance tightening — reconstructable AI is a requirement
  • Claims-fraud ROI is large and measurable, making pilot budgets easy to justify
  • Ageing investigator population — institutional memory is a ticking budget line
  • EU Retail Investment Strategy raising conduct-of-business AI scrutiny

Threats

  • Vendor-embedded AI (Guidewire AI, Duck Creek AI) as default in the claims system
  • Hyperscaler “insurance AI” accelerators bundled with cloud consumption commits
  • Internal data-lake projects consuming the AI budget before Eldric is considered
  • Regulator reluctance to approve unfamiliar architectures — needs evidence, takes time

First entry points — concrete value in 30 / 90 / 180 days

30 days

Claim-triage demo

Install alpha.3. Ingest 50 anonymised claims. Show the drafted memo + citations workflow to one adjuster team. No PII moves outside the sandbox.

90 days

Fraud-pattern workspace

Matrix Memory profile seeded with prior-year fraud outcomes. Incoming claim test: does recall surface the 2019 workshop-ring case? Measured precision/recall vs. current triage.

180 days

Multi-line rollout

Property + auto tenants live. Audit log integrated with SIEM. Solvency II evidence package generated quarterly. Legacy SaaS claims-assistant decommissioned.

Install alpha.3 Privacy-first Memory article Banking use case office@eldric.ai
#InsuranceAI #Claims #FraudDetection #SolvencyII #GDPR #MatrixMemory #PageIndex