RAG-Grounded Chat — Glossary

Retrieve, then answer

A flow from a visitor's question to a search over the brand's own pages to the model writing an answer that links back to the source page.

The agent retrieves real pages first, then answers from them and cites the source.

This is the default approach for ecommerce and SaaS chat in 2026 because the alternative, an ungrounded model, makes things up. Ask an ungrounded model "what is your return policy?" and it will answer plausibly but often wrongly, citing a 30-day window when the real policy is 60 days, or the reverse. A grounded reply points to the actual page; an ungrounded one falls back on the average of its training data.

The basic flow is straightforward:

Index. Crawl the site, split each page into passages, turn those into embeddings, and store them in a vector index.
Retrieve. When a question comes in, find the passages most relevant to it.
Re-rank (optional). Re-score the top passages for relevance, trading a little speed for better quality.
Compose. Drop the retrieved passages into the prompt as grounding, with markers for citation.
Generate. The model writes the answer, ideally citing the passages it leaned on.
Cite. The reply shows the source URLs, so the visitor can verify and trust it.

Yokaify runs this whole pipeline behind the scenes and grounds replies in your own site content. See /pricing for the current plan tiers.

It is not magic, and the honest failure modes are worth naming: the right passage occasionally doesn't surface in the top results; an answer split across two passages can lose its thread; and a page that was updated but not yet re-indexed can leave the agent a step behind. The RAG over website entry covers the storage and freshness patterns, and the RAG-grounded chat 2026 architecture post goes into the engineering.

Last updated May 31, 2026.