Privacy-first · use case

When “AI” and “confidential” have to coexist

by Juergen Paulhart · 2026-04-24 · ~7 min read

“I won’t paste the deposition into ChatGPT. So how do I get AI help with it?” — every lawyer, HR lead, wealth advisor, and investigative journalist looking at consumer AI.

Mainstream AI assistants are fantastic for generic work and the wrong shape for anyone who handles privileged material by trade. The issue isn’t the model — it’s the route the tokens take: out to a third-party API, into someone else’s logs, governed by a data-handling clause you didn’t negotiate.

Eldric AI OS flips that topology. One dnf install on a Fedora 42+ host stands up the edge, the router, the retrieval layer, the identity service, and an embedded llama.cpp LLM — in one process, on one machine, behind your firewall. The nearest outbound hop is optional.

Value propositions

Zero egress by default

Inference runs locally via embedded llama.cpp (CPU or CUDA). No API key, no outbound call. Cloud backends are an explicit admin action and visible per-model in the UI.

Session-local opt-out

The session.local plugin scrubs the cluster-side session file, conversation history, and dream profile on one click. Further writes shift to browser localStorage. The server forgets.

Hash-chained audit trail

Every privacy toggle and retrieval decision lands in a tamper-evident log. Auditors can reconstruct the state of any session at any past point without trusting the server’s present state.

Real multi-tenant isolation

Users, tenants, projects, workgroups shipped in alpha.3. Matter numbers become tenant scopes; chinese walls become workgroup boundaries; retrieval can’t cross boundaries by accident.

GDPR Article 17 in one call

DELETE /conversations?all=true scrubs an identity’s full archive. Right-to-erasure is an API call, not a ticket.

Source-side ACLs preserved

Eldric authenticates to NFS, SQL, and ODBC sources as a service principal. Users get retrieval against the files their source credentials already allow — nothing more.

AI-driven differentiator

The usual “private AI” pitch is a promise about how a vendor will handle your data. Eldric is architectural: the tokens can’t leak because the path doesn’t exist. Promises are contracts; local execution is math.

Scalable use cases

Law firms. Matter-scoped tenants, chinese walls as workgroups, audit log for bar-side discovery. Case research, contract comparison, deposition summaries, deal-room review.
HR & people ops. Grievance notes, compensation data, reorg planning, medical accommodations. Eldric lives inside the HRIS perimeter and reads the same SQL payroll reads via ODBC.
Family offices & private banks. Per-family tenant, advisor workgroups. The model runs on a box you control; the audit log is the regulator-ready artefact.
Investigative journalism. Source-protection-shaped defaults. session.local lets the cluster forget the conversation happened; the archive scrub is provable.
Research & clinical. IRB-approved perimeter, NFS integration, Science Worker’s 140+ API surface. Pair with Matrix Memory for institutional recall across cohorts.

Runs on commodity hardware

Eldric AI OS was built to land on small clusters, not on hyperscaler fleets. The whole stack is one binary; the on-prem LLM is embedded llama.cpp. The hardware plan that gets most organisations into production looks like this:

3× RTX 4090 — sweet spot

72 GB total VRAM with tensor-split. Llama 3.3 70B Q4 at 60–80 tok/s, a parallel 8B routing model, and an embedding server concurrently. One-time hardware cost ~€5–7k.

Single RTX 4090 / 4080 — team scale

24 GB. Llama 3.1 8B at 80+ tok/s, 13B comfortable, 32B Q4 possible. Enough for a small department chat with fan-out retrieval.

CPU-only — pilot scale

llama.cpp on 32+ core x86 runs 8B Q4 usefully. Matrix Memory is CPU-memory-bound. A refurbished server from the rack is enough to prove the architecture.

Scale up

Multi-node cluster with H100 / GH200 for research-grade workloads. Same binary, same role modules, topology-aware. See the HPC article.

The arithmetic: a €6k workstation displaces a €30–60k-per-year SaaS-AI contract that still leaks IP, still can’t reach your mainframe, and still has a “we may use your data for training” clause hiding somewhere.

What the disk bill looks like

Artefact	Size	Notes
`eldric-aios-5.0.0-3.alpha3.fc43.x86_64.rpm`	~1.4 MB	CPU baseline binary; one RPM, one systemd unit.
`eldric-aios-cuda` add-on	~512 MB	Pulled in automatically via `Supplements: cuda-drivers` on GPU hosts. Contains GGML_CUDA llama.cpp.
Llama 3.1 8B Q4_K_M GGUF	~4.9 GB	Good default for team-scale chat on a single 4090.
Llama 3.3 70B Q4_K_M GGUF	~40 GB	The sweet spot for 3×4090 tensor-split. Holds a 16k context comfortably.
Mixtral 8x22B Q4 GGUF	~80 GB	Tight on 3×4090; comfortable on 4×4090 or 2×H100.
nomic-embed-text (embedding)	~700 MB	CPU or GPU. One per cluster; handles vector indexing.
Matrix Memory `.emm` per domain	50–500 MB	Depends on rank × dim (see memory article). `chat` 64/768 ~200 kB; `particle_physics` 512/1024 ~500 MB.
Vector store per 1M chunks	~6–10 GB	Depends on embedding dim. SQLite backend; FAISS optional.
Hash-chained audit log	~200 MB / 1M calls	JSONL, append-only, rotation at 500 MB files by default.

Three reference hardware setups

	Pilot / team	Department / BU	Production / enterprise
CPU	1× EPYC 7313 (16c) or i9-14900K	2× EPYC 9354 (32c each)	2× EPYC 9654 (96c) per node
GPU	1× RTX 4090 (24 GB)	3× RTX 4090 (72 GB)	4× H100 (320 GB) or 8× H200
RAM	128 GB DDR5	256 GB DDR5 ECC	1 TB DDR5 ECC per node
Storage	2× 4 TB NVMe (RAID-1)	6× 8 TB NVMe (RAID-10) + SSD cache	Tiered: NVMe hot + TB-scale HDD / Lustre
Network	1 GbE OK	10 GbE with link agg	25/100 GbE or IB-HDR for multi-node
Power	~1 kW typical / 1.5 kW peak	~2 kW typical / 3 kW peak	4–6 kW per node
Hardware cost	~€4–5k	~€12–15k	€80–250k per node
Serves	8B model, 10–30 concurrent chat users	70B Q4 at 60–80 tok/s, 200–500 users	Mixtral / Llama-405B, 2k+ users per node

Network + ops footprint

Ports. One outward port (443 at the edge). Internally: controller on 8880, data on 8892, inference on 8883, science on 8897, etc. — all behind the edge.
Storage layout. ${ELDRIC_DATA_DIR} defaults to /data/eldric if writable, else /var/lib/eldric. Subdirs: models/, vectors/, memory/ (matrix memory), storage/ (file storage), agent/, edge/, and per-module dirs.
Backup. The audit log and .emm files are the two artefacts that matter. Everything else regenerates. Snapshot the data dir nightly; off-site every week.
Updates. dnf upgrade eldric-aios. Rollback is dnf downgrade. Zero vendor dance.
Ops team. A single systems engineer can run a pilot install. A team of two runs a department deployment. Production enterprise uses your existing Linux sysadmin rota.

SWOT — an honest read

Strengths

Zero egress: embedded llama.cpp, no external API required
Single signed RPM — validation-friendly install surface
Hash-chained audit log + GDPR Article-17 API shipped
Multi-tenant identity service real in alpha.3
Open-source stack, inspectable end to end

Weaknesses

alpha.3 — requires professional ops; no GUI installer yet
SOC-2 / ISO-27001 certifications in preparation, not yet held
SAML / OIDC SSO integrations are Phase-4 roadmap
On-prem inference quality still depends on the chosen open model

Opportunities

EU AI Act + DORA enforcement makes auditable on-prem non-negotiable for regulated industries
Consumer-AI trust backlash after vendor data incidents
Sovereign-cloud push across EU member states
GDPR supervisory authorities tightening default postures

Threats

Hyperscaler “enterprise tier” contracts papering over real egress
Policy fatigue — organisations tolerating shadow-IT ChatGPT despite rules
Misinterpretation that “private cloud” means vendor-hosted private tenant

First entry points — concrete value in 30 / 90 / 180 days

30 days

Pilot install

Single-node dnf install eldric-aios on a Fedora 42+ box. Onboard three internal power users. Demo the session.local toggle and audit log to the risk team.

90 days

First regulated workload

Pick one workload (contract review, HR case notes, or claims triage). Stand up a tenant, wire its data source (NFS or ODBC), show cross-session resume across devices.

180 days

Production rollout

Multi-tenant across departments. Hash-chained audit feeds into the SIEM. GDPR-erasure runbook documented. Shadow-IT AI accounts decommissioned.

Install alpha.3 Banking use case Insurance use case Data access article office@eldric.ai

#PrivateAI #GDPR #OnPremLLM #LegalAI #HRAI #FamilyOffice #SovereignAI