Using RAG in Eldric

Ground the answer
in your own documents.

Eldric ships with retrieval-augmented generation (RAG) on by default. You upload documents into a knowledge base; the platform indexes them; when you ask the model a question that overlaps with that content, the answer is grounded in the actual material rather than the model's training data, with citations back to the source passages. This page walks the surface — what RAG is, how to upload, how to ask, how to read citations, and what to do when something goes wrong.

What RAG actually is

One paragraph.

Retrieval-augmented generation means: before the model answers, the platform searches the documents you've given it for passages relevant to your question, hands those passages to the model as context, and asks for the answer in that context. Two upsides over plain chat: the answer cites real material instead of paraphrasing the model's training, and you can teach the platform your own terminology, your own protocols, your own institutional documents, just by uploading them. RAG is what makes Eldric useful for a hospital that needs answers grounded in its own clinical guidelines, a bank that needs answers grounded in its own compliance manuals, a factory that needs answers grounded in its own equipment manuals.

Out of the box

RAG is on by default.

You don't need to flip a switch. As soon as Eldric 5.0 is up — single-node or cluster — the RAG path is wired through. The platform ships a small on-device embedding model (a quantised GGUF, ~80 MB) that runs locally, so the embeddings never leave your cluster. The vector store lives on the data worker; the embedding model lives on the native inference daemon (Inferenced). Both are managed by the controller; you don't configure them yourself unless you're running a custom topology.

Uploading documents

From the GUI.

Open the admin console at https://<your-host>/admin and pick Knowledge bases → New KB. Give the knowledge base a name (a department, a project, a study), then click Upload and drop the files. Supported: PDF, DOCX, plain text, Markdown, HTML. Files start uploading immediately; once each file lands, the platform extracts the text, splits it into chunks, embeds each chunk into a 768-dimension vector, and stores the chunk + vector in the data worker. The KB status page shows progress per-file; once a file is green you can query against it.

For larger drops, the GUI uses chunked upload (§107) so the browser can pause / resume / recover from network drops without re-uploading from scratch.

From the API.

Same backend, scriptable:

curl -X POST -H "X-API-Key: $ELDRIC_API_KEY" \
     -F "file=@./clinical-guidelines.pdf" \
     -F "tenant_id=hospital-cardiology" \
     -F "namespace_id=guidelines-2026" \
     https://<your-host>/api/v1/vector/ingest

The endpoint returns a job ID; /api/v1/vector/sources/<tenant_id>/<namespace_id> lists what's already indexed.

Asking grounded questions

Just ask.

In the chat shell, pick a knowledge base from the source picker in the composer (or leave it on "all available" to search across every KB you have access to). Ask your question as you normally would. The platform embeds your question with the same model that embedded the documents, runs a k-nearest-neighbour search against the vector store, hands the top hits to the model alongside your question, and asks for the answer.

You'll know RAG fired because the assistant's response carries a row of citation chips at the bottom — small numbered references that map to the actual source passages. Click a citation to expand it and see the chunk of document the answer was grounded in.

Reading citations

The trail back to the source.

Each citation chip carries: the source document name, the page or section if available, and a short preview of the matched passage. Clicking expands the full passage in the artifact pane alongside the chat. This matters more than it sounds: the citation is the difference between "the model claims this" and "you can prove the platform pulled this from your guideline." For regulated workflows, that proof is the whole point.

If the answer should have cited a document but didn't, two likely causes: the document hasn't finished indexing (check the KB status page), or the question doesn't match the document's wording closely enough (try rephrasing — the platform's semantic search is good but not telepathic).

Managing knowledge bases

Delete, organise, share.

The Admin console → Knowledge bases page lets you rename, delete, and inspect each KB. Deleting a KB removes its vector entries and its source documents from the data worker; the operation is non-recoverable, so confirm before clicking.

Within a tenant, you organise KBs by purpose — one per project, one per study, one per audience. Across tenants, the platform isolates KBs at the gateway: a knowledge base belongs to one tenant, and members of other tenants cannot list, search or read it.

If you need a KB available to two tenants, the right answer in 5.0 is to use the .nexus bundle export — pack the KB, hand the bundle to the other tenant, they unpack it as their own KB. The 5.1 line adds first-class cross-tenant sharing as part of Federation Layer B.

Troubleshooting

The three things that can go wrong.

1. Indexing failed

A file is stuck on "Indexing…" for longer than a few minutes. Check the KB status page for the per-file error. Common causes: scanned PDFs with no OCR text layer (the text extractor finds nothing); password-protected files; corrupt downloads. Fix the file or remove it and re-upload.

2. Embedding backend down

Uploads succeed but never finish embedding; queries fall back to plain chat with no citations. The embedding model (Inferenced GGUF) isn't reachable. Open the Admin console → Inferenced and confirm the embedding model is loaded; if it's not, click Load. If Inferenced isn't running at all, restart it with sudo systemctl restart eldric-aios-inferenced on the host.

3. Question doesn't match the documents

The platform searched, found nothing relevant above its similarity threshold, and answered without citations. Two paths: rephrase the question using terms you'd expect to see in the source document, or lower the similarity threshold in the KB settings (default 0.3 — lower means "return looser matches", at the cost of irrelevant hits creeping in).

Going deeper

Next.

The under-the-hood view of how RAG is wired: RAG architecture. For the compressed-memory preview that speeds up KB search at concurrency: advanced retrieval. For the inference-side preview that consults memory at the prompt boundary: smart memory inference.

Operations side, day-2: admin guide covers tenant onboarding, KB ingestion walkthroughs, and the monitoring alert recipes.

Ground the answerin your own documents.