Most RAG systems pre-warehouse documents and pay storage + re-embedding costs whether anyone asks about them or not. Eldric is different: the platform tries learned weights first, then associative memory, then your knowledge base, then live external sources — and only ingests the material the platform actually finds useful, when it finds it useful. The result is a knowledge layer that grows on signal, not on hoarding.
On every query, the platform walks four tiers in order, stopping as soon as it has a confident answer:
Each tier carries a confidence signal; the platform only escalates to the next tier when the current one isn't confident enough. Saves cycles, saves money on paid external APIs, keeps the latency budget intact.
The cascade is the read path. The retention loop is what happens after — and it's what makes Eldric get smarter with use rather than getting heavier with use.
The flow:
The whole loop is opt-out per tenant; admins can run with the retention loop off if they want a static RAG layer. By default it's on, because that's the path to a knowledge layer that improves with traffic instead of stagnating.
The retention loop runs behind the scenes. From the user's perspective, the only new surface is the small footer under each assistant response — a 👍 button, a 👎 button, and a "view sources" link that expands the citations inline. Clicking 👍 (or expanding a citation, which counts as a soft accept) starts the ingest. Clicking 👎 marks the answer as low-quality; the citations are not auto-ingested, the dream cycle weights them lower, and the platform tries different sources next time the same topic comes up.
The user never sees the cascade choosing tiers; they don't have to know whether their answer came from learned weights or live OpenAlex. The answer comes back with citations; the loop runs in the background.
Less storage. A traditional pre-warehouse RAG ingests every source the operator can think of and hopes the relevant ones are in there. Eldric only ingests the sources that someone actually accepts — so the knowledge base stays the size of what the platform actually uses.
Faster over time. Queries that used to require tier-3 (vector search) move to tier-2 (compressed memory) move to tier-1 (learned weights) as the platform internalises patterns. Latency goes down without any tuning.
Paid external APIs less often. If a question's answer is in your own documents or already in the platform's learned weights, the tier-4 escalation never fires. Bills for paid embedding or paid retrieval APIs drop in proportion.
The trade-off is honest: a brand-new installation answers tier-3 / tier-4 a lot, because nothing's in the platform's learned weights yet. After a few weeks of accepted queries, tier-1 / tier-2 carries an increasing share. The platform pays back the cold-start tax over time, not all at once.
The customer-facing how-to for the RAG side: using RAG. The technical view of how the cascade is wired across the workers: how it works. The chunking layer that determines RAG hit quality: chunking strategies. For customers who want to teach Eldric their own intent classes (so tier-1 covers domain-specific queries faster): custom classification.