Insurance · use case

Claims, fraud, and the archive nobody has time to read

by Juergen Paulhart · 2026-04-24 · ~8 min read

“Claim volumes doubled in five years, adjuster headcount hasn’t. Meanwhile the institutional memory of every recognisable fraud pattern is in our senior investigators’ heads. Three of them retire next year.”

Every mature insurer owns three things that AI should obviously help with and usually doesn’t: the growing claims pile, the historical fraud archive whose patterns live only in senior investigators’ heads, and the policy- language corpus that’s impenetrable on purpose. What blocks the obvious help is always the same — PII is regulated, the archive lives in the core, and nobody wants loss-adjustment deliberations leaking into a vendor’s logs.

Eldric AI OS is the on-prem answer. The Matrix Memory v4 Gated DeltaNet update rule compresses decades of claim outcomes into a dense associative store; vector retrieval anchors the answer to specific documents; the hash-chained audit log is the evidence Solvency II reviewers ask for. Everything stays inside the insurer’s network.

Value propositions

Claims triage draft in minutes

The data module indexes the claim bundle on ingest; the on-prem LLM drafts a triage memo with line-level citations back to the PDFs. A 3-hour first-day review becomes a 15-minute supervised read.

Fraud-pattern recall

Matrix Memory’s outer-product writes absorb decades of claim–outcome pairs. An incoming claim that resembles a known fraud vector returns a similarity score with a pointer back to the precedent. Not a decision, a lead.

Policy-language chat

data.pageindex (hierarchical tree reasoning, sketch in alpha.3) outperforms vector similarity on structured policy docs. Answers cite sublimits, exclusions, and endorsements back to the section they came from.

Per-claim audit reconstruction

Every retrieval and prompt is hash-chained. “Why did the AI say that?” is answerable at any future point. Solvency II model-governance evidence by construction.

Line-of-business tenants

Property, auto, health, life, re-insurance — each its own tenant. GDPR data minimisation by architecture: retrieval can’t cross the boundary.

No claim data egress

On-prem llama.cpp default. No vendor gets PII as training material. Your reinsurance treaty obligations don’t suddenly include an AI vendor’s sub-processor list.

AI-driven differentiator

The two hardest problems for insurer AI are structured document reasoning (policy wording) and cross-claim pattern recall (fraud). Eldric ships both as first-class architectural primitives, not as RAG shims. data.pageindex does tree-walking over policy documents; Matrix Memory v4 compresses decades of fraud outcomes into associative recall that’s independent of context-window length. Context-window LLMs can’t do either.

Scalable use cases

Property & casualty claims. End-to-end triage, fraud lead, and policy check in one pipeline. Per-claim tenant; per-adjuster workgroup; audit trail per-decision.
Auto claims with photos. Media Worker ingests photos, vector-indexes damage patterns alongside the narrative. Cross-bundle pattern recall via Matrix Memory.
Health & life underwriting. Per-applicant tenant (strong PII posture). Retrieval across medical records (ODBC to EMR), actuarial tables, and prior declines.
Reinsurance treaty analysis. Decades of treaty wording in data.pageindex; recall similar exclusions from prior disputes. Internal-only, audit-logged.
Customer service. Grounded chat against the specific customer’s policy documents + claim history. Eldric tenant per product line keeps chinese walls clean.

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

Mid-market insurer baseline

3×4090 handles claims triage + fraud recall + policy chat for 500 concurrent adjusters. Matrix Memory for fraud (256 rank, 1024 dim) fits in 500 MB.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

Artefact	Size	Notes
`eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm`	~1.4 MB	CPU baseline binary; one RPM, one systemd unit.
`eldric-aios-cuda` add-on	~512 MB	Pulled in automatically via `Supplements: cuda-drivers` on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF	~4.9 GB	Good default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF	~40 GB	The sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF	~80 GB	Tight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)	~700 MB	CPU or GPU. One per cluster; handles vector indexing.
Matrix Memory `.emm` per domain	50–500 MB	Depends on rank × dim (see memory article). `chat` 64/768 ~200 kB; `particle_physics` 512/1024 ~500 MB.
Vector store per 1M chunks	~6–10 GB	Depends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log	~200 MB / 1M calls	JSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

	Pilot / team	Department / BU	Production / enterprise
CPU	1× EPYC 7313 (16c) or i9-14900K	2× EPYC 9354 (32c each)	2× EPYC 9654 (96c) per node
GPU	1× RTX 4090 (24 GB)	3× RTX 4090 (72 GB)	4× H100 (320 GB) or 8× H200
RAM	128 GB DDR5	256 GB DDR5 ECC	1 TB DDR5 ECC per node
Storage	2× 4 TB NVMe (RAID-1)	6× 8 TB NVMe (RAID-10) + SSD cache	Tiered: NVMe hot + TB-scale HDD / Lustre
Network	1 GbE OK	10 GbE with link agg	25/100 GbE or IB-HDR for multi-node
Power	~1 kW typical / 1.5 kW peak	~2 kW typical / 3 kW peak	4–6 kW per node
Hardware cost	~€4–5k	~€12–15k	€80–250k per node
Serves	8B model, 10–30 concurrent chat users	70B Q4 at 60–80 tok/s, 200–500 users	Mixtral / Llama-405B, 2k+ users per node

Network + ops footprint

Ports. One outward port (443 at the edge). Internally: controller on 8880, data on 8892, inference on 8883, science on 8897, etc. — all behind the edge.
Storage layout. ${ELDRIC_DATA_DIR} defaults to /data/eldric if writable, else /var/lib/eldric. Subdirs: models/, vectors/, memory/ (matrix memory), storage/ (file storage), agent/, edge/, and per-module dirs.
Backup. The audit log and .emm files are the two artefacts that matter. Everything else regenerates. Snapshot the data dir nightly; off-site every week.
Updates. dnf upgrade eldric-aios. Rollback is dnf downgrade. Zero vendor dance.
Ops team. A single systems engineer can run a pilot install. A team of two runs a department deployment. Production enterprise uses your existing Linux sysadmin rota.

SWOT — an honest read

Strengths

Matrix Memory v4 compresses decades of fraud patterns into a single associative store
data.pageindex sketch for structured policy docs — already in the SDK
Hash-chained audit log + identity service + multi-tenant — Solvency II primitives by construction
Runs on commodity 3×4090; does not require a hyperscaler contract

Weaknesses

Actuarial tooling (reserving, capital models) not yet native — integrated via ODBC for now
PDF OCR uses external tools (Tesseract, pdfplumber); no built-in ICR engine
Claims-system-specific ontologies (Guidewire, Duck Creek, Sapiens) require customer extensions
Photo-damage model selection is customer’s choice — Eldric supplies the pipeline, not the vision model

Opportunities

Solvency II model-governance tightening — reconstructable AI is a requirement
Claims-fraud ROI is large and measurable, making pilot budgets easy to justify
Ageing investigator population — institutional memory is a ticking budget line
EU Retail Investment Strategy raising conduct-of-business AI scrutiny

Threats

Vendor-embedded AI (Guidewire AI, Duck Creek AI) as default in the claims system
Hyperscaler “insurance AI” accelerators bundled with cloud consumption commits
Internal data-lake projects consuming the AI budget before Eldric is considered
Regulator reluctance to approve unfamiliar architectures — needs evidence, takes time

First entry points — concrete value in 30 / 90 / 180 days

30 days

Claim-triage demo

Install alpha.3. Ingest 50 anonymised claims. Show the drafted memo + citations workflow to one adjuster team. No PII moves outside the sandbox.

90 days

Fraud-pattern workspace

Matrix Memory profile seeded with prior-year fraud outcomes. Incoming claim test: does recall surface the 2019 workshop-ring case? Measured precision/recall vs. current triage.

180 days

Multi-line rollout

Property + auto tenants live. Audit log integrated with SIEM. Solvency II evidence package generated quarterly. Legacy SaaS claims-assistant decommissioned.

Install alpha.3 Privacy-first Memory article Banking use case office@eldric.ai

#InsuranceAI #Claims #FraudDetection #SolvencyII #GDPR #MatrixMemory #PageIndex