Insurance · use case
Claims, fraud, and the archive nobody has time to read
“Claim volumes doubled in five years, adjuster headcount hasn’t. Meanwhile the institutional memory of every recognisable fraud pattern is in our senior investigators’ heads. Three of them retire next year.”
Every mature insurer owns three things that AI should obviously help with and usually doesn’t: the growing claims pile, the historical fraud archive whose patterns live only in senior investigators’ heads, and the policy- language corpus that’s impenetrable on purpose. What blocks the obvious help is always the same — PII is regulated, the archive lives in the core, and nobody wants loss-adjustment deliberations leaking into a vendor’s logs.
Eldric AI OS is the on-prem answer. The Matrix Memory v4 Gated DeltaNet update rule compresses decades of claim outcomes into a dense associative store; vector retrieval anchors the answer to specific documents; the hash-chained audit log is the evidence Solvency II reviewers ask for. Everything stays inside the insurer’s network.
Value propositions
Claims triage draft in minutes
The data module indexes the claim bundle on ingest; the on-prem LLM drafts a triage memo with line-level citations back to the PDFs. A 3-hour first-day review becomes a 15-minute supervised read.
Fraud-pattern recall
Matrix Memory’s outer-product writes absorb decades of claim–outcome pairs. An incoming claim that resembles a known fraud vector returns a similarity score with a pointer back to the precedent. Not a decision, a lead.
Policy-language chat
data.pageindex (hierarchical tree reasoning, sketch in alpha.3) outperforms vector similarity on structured policy docs. Answers cite sublimits, exclusions, and endorsements back to the section they came from.
Per-claim audit reconstruction
Every retrieval and prompt is hash-chained. “Why did the AI say that?” is answerable at any future point. Solvency II model-governance evidence by construction.
Line-of-business tenants
Property, auto, health, life, re-insurance — each its own tenant. GDPR data minimisation by architecture: retrieval can’t cross the boundary.
No claim data egress
On-prem llama.cpp default. No vendor gets PII as training material. Your reinsurance treaty obligations don’t suddenly include an AI vendor’s sub-processor list.
AI-driven differentiator
The two hardest problems for insurer AI are structured document reasoning (policy wording) and cross-claim pattern recall (fraud). Eldric ships both as first-class architectural primitives, not as RAG shims. data.pageindex does tree-walking over policy documents; Matrix Memory v4 compresses decades of fraud outcomes into associative recall that’s independent of context-window length. Context-window LLMs can’t do either.
Scalable use cases
- Property & casualty claims. End-to-end triage, fraud lead, and policy check in one pipeline. Per-claim tenant; per-adjuster workgroup; audit trail per-decision.
- Auto claims with photos. Media Worker ingests photos, vector-indexes damage patterns alongside the narrative. Cross-bundle pattern recall via Matrix Memory.
- Health & life underwriting. Per-applicant tenant (strong PII posture). Retrieval across medical records (ODBC to EMR), actuarial tables, and prior declines.
- Reinsurance treaty analysis. Decades of treaty wording in
data.pageindex; recall similar exclusions from prior disputes. Internal-only, audit-logged. - Customer service. Grounded chat against the specific customer’s policy documents + claim history. Eldric tenant per product line keeps chinese walls clean.
Runs on commodity hardware
Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:
3× RTX 4090 — sweet spot
72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.
Single RTX 4090 / 4080 — team scale
24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.
CPU-only — pilot scale
llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.
Scale up
Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.
Mid-market insurer baseline
3×4090 handles claims triage + fraud recall + policy chat for 500 concurrent adjusters. Matrix Memory for fraud (256 rank, 1024 dim) fits in 500 MB.
The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.
What the disk bill looks like
| Artefact | Size | Notes |
|---|---|---|
eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm | ~1.4 MB | CPU baseline binary; one RPM, one systemd unit. |
eldric-aios-cuda add-on | ~512 MB | Pulled in automatically via Supplements: cuda-drivers on GPU hosts. Contains GGML_CUDA llama.cpp. |
| Llama 3.1 8B Q4_K_M GGUF | ~4.9 GB | Good default for team-scale chat on a single 4090. |
| Llama 3.3 70B Q4_K_M GGUF | ~40 GB | The sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably. |
| Mixtral 8x22B Q4 GGUF | ~80 GB | Tight on 3×4090; comfortable on 4×4090 or 2×H100. |
| nomic-embed-text (embedding) | ~700 MB | CPU or GPU. One per cluster; handles vector indexing. |
Matrix Memory .emm per domain | 50–500 MB | Depends on rank × dim (see memory article). chat 64/768 ~200 kB; particle_physics 512/1024 ~500 MB. |
| Vector store per 1M chunks | ~6–10 GB | Depends on embedding dim. SQLite backend; FAISS optional. |
| Hash-chained audit log | ~200 MB / 1M calls | JSONL, append-only, rotation at 500 MB files by default. |
Three reference hardware setups
| Pilot / team | Department / BU | Production / enterprise | |
|---|---|---|---|
| CPU | 1× EPYC 7313 (16c) or i9-14900K | 2× EPYC 9354 (32c each) | 2× EPYC 9654 (96c) per node |
| GPU | 1× RTX 4090 (24 GB) | 3× RTX 4090 (72 GB) | 4× H100 (320 GB) or 8× H200 |
| RAM | 128 GB DDR5 | 256 GB DDR5 ECC | 1 TB DDR5 ECC per node |
| Storage | 2× 4 TB NVMe (RAID-1) | 6× 8 TB NVMe (RAID-10) + SSD cache | Tiered: NVMe hot + TB-scale HDD / Lustre |
| Network | 1 GbE OK | 10 GbE with link agg | 25/100 GbE or IB-HDR for multi-node |
| Power | ~1 kW typical / 1.5 kW peak | ~2 kW typical / 3 kW peak | 4–6 kW per node |
| Hardware cost | ~€4–5k | ~€12–15k | €80–250k per node |
| Serves | 8B model, 10–30 concurrent chat users | 70B Q4 at 60–80 tok/s, 200–500 users | Mixtral / Llama-405B, 2k+ users per node |
Network + ops footprint
- Ports. One outward port (443 at the edge). Internally: controller on 8880, data on 8892, inference on 8883, science on 8897, etc. — all behind the edge.
- Storage layout.
${ELDRIC_DATA_DIR}defaults to/data/eldricif writable, else/var/lib/eldric. Subdirs:models/,vectors/,memory/(matrix memory),storage/(file storage),agent/,edge/, and per-module dirs. - Backup. The audit log and
.emmfiles are the two artefacts that matter. Everything else regenerates. Snapshot the data dir nightly; off-site every week. - Updates.
dnf upgrade eldric-aios. Rollback isdnf downgrade. Zero vendor dance. - Ops team. A single systems engineer can run a pilot install. A team of two runs a department deployment. Production enterprise uses your existing Linux sysadmin rota.
SWOT — an honest read
Strengths
- Matrix Memory v4 compresses decades of fraud patterns into a single associative store
data.pageindexsketch for structured policy docs — already in the SDK- Hash-chained audit log + identity service + multi-tenant — Solvency II primitives by construction
- Runs on commodity 3×4090; does not require a hyperscaler contract
Weaknesses
- Actuarial tooling (reserving, capital models) not yet native — integrated via ODBC for now
- PDF OCR uses external tools (Tesseract, pdfplumber); no built-in ICR engine
- Claims-system-specific ontologies (Guidewire, Duck Creek, Sapiens) require customer extensions
- Photo-damage model selection is customer’s choice — Eldric supplies the pipeline, not the vision model
Opportunities
- Solvency II model-governance tightening — reconstructable AI is a requirement
- Claims-fraud ROI is large and measurable, making pilot budgets easy to justify
- Ageing investigator population — institutional memory is a ticking budget line
- EU Retail Investment Strategy raising conduct-of-business AI scrutiny
Threats
- Vendor-embedded AI (Guidewire AI, Duck Creek AI) as default in the claims system
- Hyperscaler “insurance AI” accelerators bundled with cloud consumption commits
- Internal data-lake projects consuming the AI budget before Eldric is considered
- Regulator reluctance to approve unfamiliar architectures — needs evidence, takes time
First entry points — concrete value in 30 / 90 / 180 days
Claim-triage demo
Install alpha.3. Ingest 50 anonymised claims. Show the drafted memo + citations workflow to one adjuster team. No PII moves outside the sandbox.
Fraud-pattern workspace
Matrix Memory profile seeded with prior-year fraud outcomes. Incoming claim test: does recall surface the 2019 workshop-ring case? Measured precision/recall vs. current triage.
Multi-line rollout
Property + auto tenants live. Audit log integrated with SIEM. Solvency II evidence package generated quarterly. Legacy SaaS claims-assistant decommissioned.