Beyond RAG. When RAG isn't enough, ERAG.

RAG: retrieve similar chunks, then generate. That pattern fits many narrative questions. It frays for numbers in tables, conflicting sources, time windows, versioned rules, and thin data. ERAG is shorthand here for Extended RAG: retrieval plus structured evidence, routing, and checks so answers track what the data can support.

ERAG means Extended RAG in this article only. It is not a separate product or SKU inside FAQ Ally. It labels a stack shape: keep retrieval, add typed records, planners, and validation so the model is steered by evidence and retrieved text together, not by top chunks alone.

FAQ Ally still centers on hybrid retrieval over trained documents. The rest is what wraps that core: signal-aware retrieval, structured extraction from training, aggregate and long-context handling, and branches that shorten or refuse answers when evidence is weak or contradictory.

ERAG, in simple terms, is RAG plus structured evidence and checks.

Where RAG falls short (and what ERAG is for)

In practice, "RAG" often means chunk, embed, vector search, paste the top hits into a prompt. That fits prose FAQs. ERAG targets where that pattern strains:

  • Tabular / numeric truth. Invoices, receipts, statements, subscriptions, and usage rows do not fit one fuzzy paragraph. Completions can drift off exact figures.
  • Conflicting documents. Two sources can disagree. Ranked chunks give the model conflicting prose with no merge rule.
  • Coverage gaps. A full-year question may see only part of the year in the corpus. Fluent text can sound complete when the window is not.
  • Versioned policies. The right rule may depend on an effective date, not on search rank alone.
  • Visual information. Diagrams and screenshots often hold facts the nearby text does not spell out.

ERAG handles this by feeding typed, checkable evidence and planner outcomes, not only narrative context.

What ERAG adds in FAQ Ally (four layers)

ERAG addresses these gaps in several layers. The bullets match the current architecture. Which branches run depends on agent configuration, document types, training choices, and how your deployment wires retrieval, structured records, and post-answer checks.

1. ERAG's retrieval layer

  • Hybrid search. Atlas vector search plus BM25-class keyword signals, fused so keywords can reinforce or correct vectors. Agent retrieval profiles can boost filenames, tags, and folders when that tracks intent.
  • Query rewrite. Where the stack includes it, a rewrite pass can merge with hybrid search.
  • Structured retrieval on the chunk pool. Query signals and chunk metadata (document type, structure, entities, headings) drive filtering, same-document expansion, and score blending. Reranking uses a cross-encoder-style reranker where deployed. A standalone rerank pass may still run if the structured retrieval path did not.
  • Aggregate and long-context shaping. Heuristics for aggregate-style or very long messages adjust retrieval limits. Pool size, candidate counts, similarity floors, and assembly shift so context matches the question shape.
  • Multimodal vectors. With Multimodal Image Embedding enabled in training (plan limits apply), images in PDFs and Word files are vectorized beside text. Smart Image-Aware Chunking, if you use it, can bring linked images in with relevant text chunks.

2. Structured evidence (ERAG's typed core)

  • Typed records. Training can extract invoices, receipts, statements, contracts, subscriptions, payment transactions, usage billing, and related rows into queryable storage.
  • Line items and materialized facts. Normalized line items and derived financial facts are stored when extraction produces them. That supports aggregation-style questions.
  • Evidence profiles. Document profiles can hold embedded objects (financial transactions, versioned policy rules) and capability metadata for routing.

3. Planning and routing within ERAG

  • Query understanding. Heuristic facets feed routing, logging, and planners each turn. Where the JSON-only LLM step exists, it validates against a strict schema and falls back to heuristics on parse failure.
  • Structured planner. Deterministic logic can evaluate money-row conflicts and time or entity coverage for bounded totals. Versioned policy resolution runs before answers merge with chunk context.
  • Structured-only answers. If records support a conclusion but hybrid retrieval returns no chunks, the pipeline can return structured-only text (preamble plus computed body) on those paths instead of a generic no-chunk refusal.

4. Validation and guardrails

  • Before the main completion. Quality guardrails (e.g. user-message blocklists) can run ahead of cache and retrieval. With strict citation settings, retrieval can be re-checked so weak vector matches do not pass quietly.
  • After structured augment. Post-LLM markdown repair aligns displayed money with planner output when augment modes supply authoritative totals.
  • Post-answer evidence check. Where deployed, a gate tests factual sentences against overlap with retrieved chunk text; reranker-assisted review and a bounded retry are possible. It tightens grounding; it does not replace the primary model.
  • Refusal paths. The stack can refuse or shorten answers when chunks are weak or configured refusal heuristics fire, per agent and environment.

Question-time routing with ERAG

In practice, this shows up at question time.

Narrative questions. Many turns normalize the query, run hybrid search (and related passes where configured), assemble numbered context, then call the main completion. Within ERAG, that is the familiar chunk-and-generate path.

Aggregate and numeric shapes. When structured records fit the question, the planner can compute from typed rows (preferring materialized financial facts when present for the intent), run conflict and coverage checks, then augment hybrid context or hand off to records-first flows where those routers match.

When hybrid retrieval does not run. A semantic chat response cache hit can return a stored answer without repeating retrieval or the main completion. A configured records-first short-circuit can answer from records without hybrid RAG for that turn. Those branches mean hybrid search is not on every code path; ERAG still names the overall system.

What structured depth needs. Typed extraction must match your documents. Coverage, confidence floors, and conflict rules can yield abstention, partial answers with warnings, or structured conflict reports instead of stretching thin data.

How ERAG surfaces uncertainty

  • Thin data. Strict coverage checks can block a totals answer that would imply a full period. Partial coverage may still compute, often with an explicit partial warning.
  • Hard conflicts. Under strict money-conflict rules, disagreeing rows can yield a refusal with a visible explanation rather than a blended guess.
  • Ambiguous policies. Versioned policy objects can surface conflict or ambiguity text when effective dates compete.

Within ERAG, these steps add discipline and cut some failure modes. High-stakes compliance work still needs human review.

What ERAG feels like in practice

You still ask in plain language. Outcomes differ when chunks carry the answer versus when records or uncertainty dominate.

  • Cited passages from merged chunks stay common for narrative questions.
  • Spend- or totals-style questions may show computed numbers with short preamble text from the records path.
  • Conflict or thin evidence can produce a brief decline with a stated reason.
  • Messy policy evidence may get a preamble before the body that states ambiguity.

Limits to keep in mind

ERAG does not replace solid sources, sensible training hygiene, or retraining when embeddings and enrichment improve.

Typed paths help when your content matches those extractors. Image vectors need compatible formats in files and Multimodal Image Embedding on a supporting plan. Automated checks complement human judgment; they do not substitute for it where stakes are high.

Summary

RAG is retrieve relevant chunks, then generate an answer. ERAG is Extended RAG: the same retrieval-and-generation core, plus structured evidence, routing, and checks.

That matters when the question is about numbers, conflicts, coverage, or policies, not only narrative passages. FAQ Ally implements Extended RAG in the architecture laid out above; which branches run depends on agent configuration, documents, training choices, and deployment.

Related: Multimodal image embedding | Knowledge base optimization | AI document search for teams | IT team documentation pains | Document best practices | Home

ERAG keeps retrieval at the core, but adds the structure and discipline needed for real-world questions.