Preview in 5.0.x

Ingest that reads
the schema.

The intelligent upload dialog gains a schema-aware ingest substrate. For structured content — tables, CSVs, JSON, XML schemas — the wizard reads the schema before chunking and proposes a strategy that respects record boundaries instead of slicing through them. Ingesting structured data stops requiring per-document overrides.

Next patch

Where today's chunking falls short

Records cut in half.

The 5.0 knowledge-base ingest path applies the same chunking strategy to every document. For prose — PDFs, reports, transcripts — that works well. For structured content, it often doesn't.

A 50-column CSV with 10,000 rows, chunked by token count, produces chunks that contain partial rows: the first half of row 1,247 ends one chunk, the second half starts the next. The model can recall “something about row 1,247” without ever seeing the whole row in one chunk. The same problem hits JSON documents (records split mid-object) and XML feeds (elements truncated).

The 5.0 work-around: a per-document chunking override. Power users do it; most customers don't, and end up with degraded recall on their structured data.

What's coming in 5.0.x

Schema first, chunk second.

Sniff the schema. The wizard reads the first N bytes (or the header row, or the XSD if attached) and identifies the structure.
Propose a strategy. Record-aware chunking for CSV/TSV. Object-boundary chunking for JSON. Element-boundary chunking for XML. The user sees the proposal and can override it.
Preserve metadata. Column names and types become per-chunk metadata, available for filtered retrieval (“only chunks where severity == high”).
Stays single-step. One upload action, the right strategy gets picked, the user doesn't have to know what “record-aware chunking” means to benefit from it.

What's pending

Honest gates on this page.

Still in flight

Schema sniffers for CSV / TSV / JSON / NDJSON / XML / Parquet
Record-boundary chunking implementations per format
Per-chunk metadata schema (column names, types, source row index)
Wizard UI proposal step with override controls
Migration path for existing knowledge bases (admin opt-in re-ingest)

This page updates as each piece lands. The release notes are the formal cut.

Ingest that readsthe schema.

Records cut in half.

Schema first, chunk second.

Honest gates on this page.

Still in flight

Read next.

Ingest that reads
the schema.