Data access · architecture

Connect to everything — NFS, SQL, z/OS, APIs

by Juergen Paulhart · 2026-04-23 · ~7 min read

Enterprise data doesn’t live in one place. It’s on NFS mounts somebody set up in 2009, in PostgreSQL and MySQL behind a dozen apps, in a z/OS Db2 warehouse that prints the monthly close, in SharePoint, in object storage, in vendor REST APIs. Most AI demos handle this by assuming you’ll ingest all of it into a vector database. You won’t. You’d lose freshness, ACLs, referential integrity, and the patience of every DBA in the building.

Eldric AI OS takes the other route: connect where the data lives. NFS stays NFS. SQL stays SQL. z/OS stays z/OS. Every source is a plugin under a single retrieval contract; one chat query fans out across all of them in parallel; citations travel back through the stack so users and auditors can always ask “where did that come from?”.

This post is about the contract — what it looks like, what ships in alpha.3 (more than you’d expect, thanks to unixODBC), and the honest roadmap for everything else.

ENTERPRISE DATA ELDRIC AI OS — one binary USER Filesystem / NFS today · nfs-ganesha SQLite today · always linked ODBC universal today · PG · MySQL · Oracle · MSSQL DB2 LUW + z/OS today · via IBM ODBC driver Streaming / NoSQL / phase-1 · Kafka · Mongo · S3 Native mainframe phase-2 · IBM MQ · VSAM · CICS Anything else extension · ~80 lines Python eldric-aios one process · IntraBus (in-process) · 15 role modules data module :8892 data.local data.arxiv (ext) data.nasa_apod (ext) data.kafka · data.mongo (p1) data.mq · data.vsam (p2) data.<your-ext> fan-out parallel per query data.local today = files + NFS + SQLite + ODBC + vector + memory edge + router :8880 · chat UI · auth agent + dream query planning · consolidation inference (llama.cpp) :8883 · CPU or CUDA on-prem LLM · no egress ...plus 11 more role modules. IntraBus, not HTTP. Chat query "show me last quarter's DQ_103" merged answer + source citations IntraBus

The problem with “ingest everything”

The fastest way to demo a RAG system is to crawl a share, embed every document, drop them in a vector DB, and ask questions. It’s also the fastest way to ship something that doesn’t survive first contact with a real enterprise:

The principle Eldric settles on: the database of record is the database of record. Query it live; answer with citations.

The contract — retrieval.<plugin-id>

Every data source in Eldric answers the same tiny contract. The Edge module fans a chat query out to every enabled data.* plugin and tries two backends in order:

  1. An in-process syscall named retrieval.<plugin-id>, if the owning module registered one. Sub-microsecond dispatch, no HTTP.
  2. An extension bridge — the plugin’s config carries {extension, tool} and Edge invokes the loaded extension over the kernel’s extension.invoke_tool syscall, which POSTs to the extension’s /invoke.

Either path returns the same JSON shape:

{
  "snippets": [
    {"text": "...", "source": "nfs://plant-docs/sop/1042.pdf#p7", "score": 0.81},
    {"text": "...", "source": "postgres://mes.prod/lots/DQ_103",   "score": 0.73}
  ]
}

That’s the whole API. Any source — a filesystem, a SQL database, an ODBC DSN to a mainframe, a REST API, a vector store, a hand-written Python script talking to FTP — that can produce this shape is a first-class peer in the retrieval fan-out.

What ships in alpha.3 today

The eldric-aios RPM on Fedora 42+ declares BuildRequires: unixODBC-devel and Requires: unixODBC. Not decoration — the data module links the ODBC client at build time and the shipped retrieval.data.local syscall routes to any configured DSN via SQLDriverConnect / SQLExecDirect / SQLFetch. The driver set is the admin’s choice: install whichever *-odbc package you need and add a stanza to /etc/odbcinst.ini.

That one path covers most of what an enterprise actually needs:

Backend How Status
Filesystem / NFS POSIX access inside the data module; nfs-ganesha integration for serving exports or mounting remote shares. Cited paths are the same paths ops can cat from a shell. shipped
SQLite Always linked; default backend for small ops metadata. shipped
PostgreSQL, MySQL, MariaDB Via the ODBC layer — admin installs postgresql-odbc or mysql-connector-odbc, adds a DSN, Eldric queries live. shipped
Oracle, MSSQL Same path — oracle-instantclient-odbc or Microsoft’s msodbcsql18. shipped
IBM Db2 LUW IBM’s free DSDriver registers a unixODBC driver; Eldric talks to Db2 LUW through it. shipped
IBM Db2 z/OS (mainframe) Same DSDriver speaks DRDA to the LPAR over port 446 / 50000. With a DB2 Connect license on the Linux host, the AI sees a z/OS warehouse as another ODBC DSN. shipped
Vector store + Matrix Memory Merged with the above into one retrieval.data.local answer: exact-retrieval via the vector side, pattern-recall via the matrix-memory side. shipped
data.arxiv, data.nasa_apod Reference extensions at sdk/extension/examples/ — each is ~80 lines of Python fulfilling the retrieval.<id> contract via the bridge. Templates for the rest of the 4.x science surface. shipped
data.pageindex Vectorless / reasoning-based retrieval — hierarchical TOC + LLM navigation. Useful on structured professional docs (SEC filings, FDA submissions, legal, textbooks) where vector similarity loses to expert tree-walking. sketch

That means “can Eldric talk to our DB2 z/OS warehouse?” is already a yes on alpha.3, as long as an admin installs IBM’s ODBC driver. No Phase-2 wait.

What’s on the roadmap planned

Plenty of enterprise data doesn’t speak ODBC. Those are the actual gaps.

Phase 1 — streaming, NoSQL, object storage

Headers for these connectors exist in cpp/include/distributed/data/connectors/; each becomes a real retrieval.data.<name> syscall when its driver is linked and the query path lands.

BackendDriverNotes
MongoDB mongocxx (Apache-2.0) Document store, aggregation pipeline.
Kafka librdkafka (BSD-2) Streaming ingest, topic consumption.
Elasticsearch / OpenSearch libcurl (REST) Search engine + vector-store fallback.
ClickHouse clickhouse-cpp (Apache-2.0) Column-oriented OLAP, analytical queries.
MinIO / S3 aws-sdk-cpp (S3 module) Object storage, data-lake entry.

Phase 2 — native mainframe (Enterprise Tier)

The ODBC path already covers query-side access to Db2 LUW + Db2 z/OS. Phase-2 is about the rest of the mainframe surface — messaging, legacy record stores, transaction gateways — plus a native DB2 CLI path for customers who want DRDA without going through unixODBC.

BackendDriverProtocol / use
IBM MQ IBM MQ C client libmqm (dlopen) MQI protocol, port 1414 — enterprise messaging backbone.
VSAM REST via z/OS Connect EE HTTP/JSON — customer deploys the gateway side.
IMS IBM Universal DB driver IMS Connect over TCP; DL/I navigation or SQL abstraction.
CICS CICS Liberty (REST) HTTP/JSON — transaction invocation, not a data query.
Native DB2 CLI (DRDA) IBM DB2 CLI via runtime dlopen Direct DRDA path for customers who don’t want the unixODBC indirection.

All IBM proprietary drivers are loaded via dlopen/dlsym so eldric-aios compiles and runs without them; the mainframe paths light up when the customer installs the IBM client on the Linux host. Enterprise tier licensing gates the Phase-2 set.

Phase 3 (Cassandra, HBase, Hive, Snowflake, Redshift, BigQuery, Azure Synapse, Databricks Delta Lake, Spark Connect, Druid) follows the same pattern — each an implementation against the common DataSource interface. Snowflake and Synapse are already reachable through their ODBC drivers today; Phase-3 is about native protocols where they outperform ODBC.

Everything else — the extension path works today

What about sources that don’t speak ODBC and aren’t on the Phase-1 list? SharePoint, Salesforce, ServiceNow, an FTP drop, a vendor REST API, a line-of-business SOAP service — the normal enterprise long tail.

The retrieval contract is universal, so the same manifest + ~80-line Python template the arxiv reference extension uses handles any of them:

cat > ${ELDRIC_DATA_DIR}/extensions/sharepoint_corp.extension.yaml <<YAML
extension:
  name: sharepoint_corp
  display_name: Corporate SharePoint
  category: data              # ← makes it a plugin, auto-surfaced in chat
  model: B                    # external Python process
  external_url: http://127.0.0.1:9600
  tools: [search]
YAML

curl -XPOST http://localhost:8880/api/v1/extensions/load \
     -d '{"name":"sharepoint_corp"}'
# data.sharepoint_corp is now a toggle in the chat sidebar. No Edge rebuild.

The Python side is a thin shim that takes {query, top_k, config}, calls the vendor’s SharePoint Search API, and returns the {snippets:[{text,source,score}]} contract. When a future Phase bundles SharePoint as a native connector, it ships its own syscall — but users see no change: a toggle lights up in their sidebar, the fan-out includes one more source, citations include a Db2 row reference.

That’s the payoff of making the retrieval contract the surface: “built-in” and “customer-written” look identical to the rest of the system. The C++ roadmap replaces glue code with optimised drivers without moving the contract.

Fan-out — one query, every enabled source

Turning individual connectors into useful AI is the other half of the story. Flipping three toggles in the sidebar — data.local (which today may include a SharePoint DSN or a Db2 z/OS DSN via ODBC), data.arxiv, and a customer-written data.sharepoint_corp extension — tells the Edge module to fan a single chat query out to all three in parallel, merge the returned snippets, and synthesise a system message prefixed with retrieval context before the LLM sees anything.

Every response carries an X-Eldric-Fanout header listing which plugins answered, with what count, and whether the backend was a native syscall, a loaded extension, or no-backend (toggle enabled but nothing wired). Admins see wiring gaps immediately; users see “retrieved from: Local KB (3), arXiv (5), SharePoint (2)” under the assistant message and can click through for the source list.

What this means for a private-cloud deployment

Three decisions flow from this pattern:

If your organisation’s data is spread across NFS and SQL and Db2 and SharePoint and half a dozen vendor APIs — the normal state of a grown-up enterprise — this is the shape of a private-cloud AI assistant that’s actually deployable: small kernel, one retrieval contract, lots of plugins, no big-bang ingest.

Install alpha.3 Module terminology repo.eldric.ai
#EnterpriseAI #DataFederation #Mainframe #zOS #ODBC #RAG #PrivateCloud #Eldric