Data access · architecture

Connect to everything — NFS, SQL, z/OS, APIs

by Juergen Paulhart · 2026-04-23 · ~7 min read

Enterprise data doesn’t live in one place. It’s on NFS mounts somebody set up in 2009, in PostgreSQL and MySQL behind a dozen apps, in a z/OS Db2 warehouse that prints the monthly close, in SharePoint, in object storage, in vendor REST APIs. Most AI demos handle this by assuming you’ll ingest all of it into a vector database. You won’t. You’d lose freshness, ACLs, referential integrity, and the patience of every DBA in the building.

Eldric AI OS takes the other route: connect where the data lives. NFS stays NFS. SQL stays SQL. z/OS stays z/OS. Every source is a plugin under a single retrieval contract; one chat query fans out across all of them in parallel; citations travel back through the stack so users and auditors can always ask “where did that come from?”.

This post is about the contract — what it looks like, what ships in alpha.3 (more than you’d expect, thanks to unixODBC), and the honest roadmap for everything else.

The problem with “ingest everything”

The fastest way to demo a RAG system is to crawl a share, embed every document, drop them in a vector DB, and ask questions. It’s also the fastest way to ship something that doesn’t survive first contact with a real enterprise:

Freshness dies. The database of record changes every minute; the embedding re-index runs nightly. Answers drift out of sync with reality.
ACLs die. Once a document lands in a vector store, the original access-control model is gone. HR data, unredacted contracts, closed-deal Salesforce records — all uniformly retrievable by anyone with chat access.
Operational DBs die. You’re not going to re-ingest a 300 GB z/OS Db2 warehouse into a FAISS index. Nor should you — Db2 is already indexed, already backed up, already audited.
The vendor API changes tomorrow. Cached copies of a SharePoint tree go stale the day someone renames a folder.

The principle Eldric settles on: the database of record is the database of record. Query it live; answer with citations.

The contract — `retrieval.<plugin-id>`

Every data source in Eldric answers the same tiny contract. The Edge module fans a chat query out to every enabled data.* plugin and tries two backends in order:

An in-process syscall named retrieval.<plugin-id>, if the owning module registered one. Sub-microsecond dispatch, no HTTP.
An extension bridge — the plugin’s config carries {extension, tool} and Edge invokes the loaded extension over the kernel’s extension.invoke_tool syscall, which POSTs to the extension’s /invoke.

Either path returns the same JSON shape:

{
  "snippets": [
    {"text": "...", "source": "nfs://plant-docs/sop/1042.pdf#p7", "score": 0.81},
    {"text": "...", "source": "postgres://mes.prod/lots/DQ_103",   "score": 0.73}
  ]
}

That’s the whole API. Any source — a filesystem, a SQL database, an ODBC DSN to a mainframe, a REST API, a vector store, a hand-written Python script talking to FTP — that can produce this shape is a first-class peer in the retrieval fan-out.

What ships in alpha.3 today

The eldric-aios RPM on Fedora 42+ declares BuildRequires: unixODBC-devel and Requires: unixODBC. Not decoration — the data module links the ODBC client at build time and the shipped retrieval.data.local syscall routes to any configured DSN via SQLDriverConnect / SQLExecDirect / SQLFetch. The driver set is the admin’s choice: install whichever *-odbc package you need and add a stanza to /etc/odbcinst.ini.

That one path covers most of what an enterprise actually needs:

Backend	How	Status
Filesystem / NFS	POSIX access inside the data module; `nfs-ganesha` integration for serving exports or mounting remote shares. Cited paths are the same paths ops can `cat` from a shell.	shipped
SQLite	Always linked; default backend for small ops metadata.	shipped
PostgreSQL, MySQL, MariaDB	Via the ODBC layer — admin installs `postgresql-odbc` or `mysql-connector-odbc`, adds a DSN, Eldric queries live.	shipped
Oracle, MSSQL	Same path — `oracle-instantclient-odbc` or Microsoft’s `msodbcsql18`.	shipped
IBM Db2 LUW	IBM’s free DSDriver registers a unixODBC driver; Eldric talks to Db2 LUW through it.	shipped
IBM Db2 z/OS (mainframe)	Same DSDriver speaks DRDA to the LPAR over port 446 / 50000. With a DB2 Connect license on the Linux host, the AI sees a z/OS warehouse as another ODBC DSN.	shipped
Vector store + Matrix Memory	Merged with the above into one `retrieval.data.local` answer: exact-retrieval via the vector side, pattern-recall via the matrix-memory side.	shipped
`data.arxiv`, `data.nasa_apod`	Reference extensions at `sdk/extension/examples/` — each is ~80 lines of Python fulfilling the `retrieval.<id>` contract via the bridge. Templates for the rest of the 4.x science surface.	shipped
`data.pageindex`	Vectorless / reasoning-based retrieval — hierarchical TOC + LLM navigation. Useful on structured professional docs (SEC filings, FDA submissions, legal, textbooks) where vector similarity loses to expert tree-walking.	sketch

That means “can Eldric talk to our DB2 z/OS warehouse?” is already a yes on alpha.3, as long as an admin installs IBM’s ODBC driver. No Phase-2 wait.

What’s on the roadmap planned

Plenty of enterprise data doesn’t speak ODBC. Those are the actual gaps.

Phase 1 — streaming, NoSQL, object storage

Headers for these connectors exist in cpp/include/distributed/data/connectors/; each becomes a real retrieval.data.<name> syscall when its driver is linked and the query path lands.

Backend	Driver	Notes
MongoDB	`mongocxx` (Apache-2.0)	Document store, aggregation pipeline.
Kafka	`librdkafka` (BSD-2)	Streaming ingest, topic consumption.
Elasticsearch / OpenSearch	libcurl (REST)	Search engine + vector-store fallback.
ClickHouse	`clickhouse-cpp` (Apache-2.0)	Column-oriented OLAP, analytical queries.
MinIO / S3	`aws-sdk-cpp` (S3 module)	Object storage, data-lake entry.

Phase 2 — native mainframe (Enterprise Tier)

The ODBC path already covers query-side access to Db2 LUW + Db2 z/OS. Phase-2 is about the rest of the mainframe surface — messaging, legacy record stores, transaction gateways — plus a native DB2 CLI path for customers who want DRDA without going through unixODBC.

Backend	Driver	Protocol / use
IBM MQ	IBM MQ C client `libmqm` (`dlopen`)	MQI protocol, port 1414 — enterprise messaging backbone.
VSAM	REST via z/OS Connect EE	HTTP/JSON — customer deploys the gateway side.
IMS	IBM Universal DB driver	IMS Connect over TCP; DL/I navigation or SQL abstraction.
CICS	CICS Liberty (REST)	HTTP/JSON — transaction invocation, not a data query.
Native DB2 CLI (DRDA)	IBM DB2 CLI via runtime `dlopen`	Direct DRDA path for customers who don’t want the unixODBC indirection.

All IBM proprietary drivers are loaded via dlopen/dlsym so eldric-aios compiles and runs without them; the mainframe paths light up when the customer installs the IBM client on the Linux host. Enterprise tier licensing gates the Phase-2 set.

Phase 3 (Cassandra, HBase, Hive, Snowflake, Redshift, BigQuery, Azure Synapse, Databricks Delta Lake, Spark Connect, Druid) follows the same pattern — each an implementation against the common DataSource interface. Snowflake and Synapse are already reachable through their ODBC drivers today; Phase-3 is about native protocols where they outperform ODBC.

Everything else — the extension path works today

What about sources that don’t speak ODBC and aren’t on the Phase-1 list? SharePoint, Salesforce, ServiceNow, an FTP drop, a vendor REST API, a line-of-business SOAP service — the normal enterprise long tail.

The retrieval contract is universal, so the same manifest + ~80-line Python template the arxiv reference extension uses handles any of them:

cat > ${ELDRIC_DATA_DIR}/extensions/sharepoint_corp.extension.yaml <<YAML
extension:
  name: sharepoint_corp
  display_name: Corporate SharePoint
  category: data              # ← makes it a plugin, auto-surfaced in chat
  model: B                    # external Python process
  external_url: http://127.0.0.1:9600
  tools: [search]
YAML

curl -XPOST http://localhost:8880/api/v1/extensions/load \
     -d '{"name":"sharepoint_corp"}'
# data.sharepoint_corp is now a toggle in the chat sidebar. No Edge rebuild.

The Python side is a thin shim that takes {query, top_k, config}, calls the vendor’s SharePoint Search API, and returns the {snippets:[{text,source,score}]} contract. When a future Phase bundles SharePoint as a native connector, it ships its own syscall — but users see no change: a toggle lights up in their sidebar, the fan-out includes one more source, citations include a Db2 row reference.

That’s the payoff of making the retrieval contract the surface: “built-in” and “customer-written” look identical to the rest of the system. The C++ roadmap replaces glue code with optimised drivers without moving the contract.

Fan-out — one query, every enabled source

Turning individual connectors into useful AI is the other half of the story. Flipping three toggles in the sidebar — data.local (which today may include a SharePoint DSN or a Db2 z/OS DSN via ODBC), data.arxiv, and a customer-written data.sharepoint_corp extension — tells the Edge module to fan a single chat query out to all three in parallel, merge the returned snippets, and synthesise a system message prefixed with retrieval context before the LLM sees anything.

Every response carries an X-Eldric-Fanout header listing which plugins answered, with what count, and whether the backend was a native syscall, a loaded extension, or no-backend (toggle enabled but nothing wired). Admins see wiring gaps immediately; users see “retrieved from: Local KB (3), arXiv (5), SharePoint (2)” under the assistant message and can click through for the source list.

What this means for a private-cloud deployment

Three decisions flow from this pattern:

Eldric lives on-prem. No data leaves the environment; no source system gets a replica in a third-party store. The LLM itself runs on the same host via embedded llama.cpp.
Existing ACLs keep working. Eldric authenticates as a service principal to each source. Source-side security policies apply verbatim. Per-tenant isolation in Eldric sits on top of — not instead of — source-side rules.
Adding a source is a manifest drop. A new vendor, a new database, a new mainframe region — same ~80-line Python template, same YAML manifest, same toggle appearing in every user’s sidebar. No fork, no rebuild, no user re-training.

If your organisation’s data is spread across NFS and SQL and Db2 and SharePoint and half a dozen vendor APIs — the normal state of a grown-up enterprise — this is the shape of a private-cloud AI assistant that’s actually deployable: small kernel, one retrieval contract, lots of plugins, no big-bang ingest.

Install alpha.3 Module terminology repo.eldric.ai

#EnterpriseAI #DataFederation #Mainframe #zOS #ODBC #RAG #PrivateCloud #Eldric