If you are building chat or an onsite agent on Next.js, most of the hard architectural calls are already settled. React Server Components render the page. A small Client Component runs the widget. Replies stream from a route handler. The heavy work splits between Edge and Node. None of it is exotic anymore, and the performance budget is very reachable as long as you keep the widget off the critical path.
What follows is a practical walkthrough of each layer: where the code lives, why it lives there, and the handful of places teams still trip up.
The five-layer architecture
A 2026 production chat on Next.js has five layers, three of which run on the server and two on the client.
┌─────────────────────────────────────────────────────────────┐
│ CLIENT (React 19 + Next.js 15 App Router) │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 1. Page shell — RSC, no JS shipped │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 2. Chat widget — Client Component (dynamic import) │ │
│ │ — Rive mascot (state-machine driven) │ │
│ │ — Chat surface (streaming text from server) │ │
│ │ — Behavior signal collector (scroll, dwell, etc.) │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↕ fetch + streaming
┌─────────────────────────────────────────────────────────────┐
│ SERVER (Vercel / Cloudflare / etc.) │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 3. Edge routing — Route Handler at the edge │ │
│ │ — Auth check, rate limit, signal classification │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 4. RAG retrieval — Node serverless │ │
│ │ — Vector search, re-ranking, prompt construction │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 5. Model call — streaming response from frontier LLM │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
The split is deliberate. The fast, cheap routing decision happens at the edge, close to the visitor. The slower work, retrieving the right content and writing a reply, runs in Node, where a database connection and heavier compute are available.
Layer 1: the page shell as RSC
Every piece of HTML on a marketing or product page that doesn't need browser-only state is a React Server Component. Headers, footers, navigation, hero copy, product galleries — all RSC. The result is HTML that renders without JavaScript and loads at sub-1.5s LCP on most networks.
// Server Component (default)
import { Hero } from './Hero'; // RSC
import { ProductGrid } from './ProductGrid'; // RSC, fetches from CMS
import { Mascot } from './Mascot'; // Client Component (see Layer 2)
export default async function HomePage {
return (
<>
<Hero />
<ProductGrid />
<Mascot /> {/* Mounts a small client island */}
</>
);
}
The Mascot import is a regular component import — Next.js automatically detects the 'use client' directive on the Mascot file and ships only that boundary's JS to the browser.
Layer 2: the chat widget as a Client Component
The widget is a Client Component because it owns:
- Canvas state (Rive)
- WebSocket / streaming-fetch state (chat connection)
- Per-session behavior state (scroll position, dwell time, signal counters)
- DOM event handlers (click, input, keyboard)
'use client';
import { useState, useEffect, useRef } from 'react';
import dynamic from 'next/dynamic';
const RiveMascot = dynamic( => import('./RiveMascot').then((m) => m.RiveMascot), {
ssr: false,
loading: => null,
});
export function Mascot {
const [open, setOpen] = useState(false);
const [messages, setMessages] = useState<{ role: string; text: string }>;
async function send(input: string) {
setMessages((m) => [...m, { role: 'user', text: input }]);
const res = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages: [...messages, { role: 'user', text: input }] }),
});
const reader = res.body!.getReader;
const decoder = new TextDecoder;
let assistantText = '';
setMessages((m) => [...m, { role: 'assistant', text: '' }]);
while (true) {
const { done, value } = await reader.read;
if (done) break;
assistantText += decoder.decode(value, { stream: true });
setMessages((m) => {
const last = m[m.length - 1];
return [...m.slice(0, -1), { ...last, text: assistantText }];
});
}
}
return (
<div className="fixed bottom-6 right-6">
<RiveMascot onClick={ => setOpen(true)} />
{open && <ChatSurface messages={messages} onSend={send} />}
</div>
);
}
Three things this widget gets right.
The Rive component is dynamic-imported with ssr: false. The Rive runtime never enters the SSR pipeline (it can't — it's WASM-based) and never enters the initial JS bundle.
The chat fetch uses the streaming response pattern. The body is a ReadableStream; the reader reads chunks as they arrive and updates the React state. The visitor sees text appear word-by-word, not in a 3-second silence followed by a complete response.
The widget owns its own state. There's no global store, no Redux, no Zustand. For a chat widget, local state is the right answer; the persistence layer is the server, not the client.
Layer 3: the Edge Route Handler
The Edge layer handles routing decisions: which model to call, which RAG index to query, whether to short-circuit with a cached response. The Edge function runs at the geographic edge nearest the visitor, so the round-trip is sub-50 ms even on slow networks.
// app/api/chat/route.ts
import { NextRequest } from 'next/server';
export const runtime = 'edge';
export async function POST(req: NextRequest) {
const { messages } = await req.json;
// Routing decision: small model for short replies, frontier for complex
const lastUserMessage = messages[messages.length - 1].text;
const useFrontier = lastUserMessage.length > 50 || /[?]/.test(lastUserMessage);
// Hand off to the RAG + generation backend
const upstream = await fetch(useFrontier? '/api/chat-frontier' : '/api/chat-flash', {
method: 'POST',
body: JSON.stringify({ messages }),
});
return new Response(upstream.body, {
headers: { 'content-type': 'text/event-stream' },
});
}
The Edge layer is also where rate limiting, auth checks, and signal classification live. Putting these at the edge keeps cold-start cost low and means a malformed or abusive request never reaches the heavier Node layer.
Layer 4: RAG retrieval in Node
The RAG layer needs heavier compute (vector math, embedding lookups, optional re-ranking) and a database connection to a vector store. Edge functions are not the right fit; Node serverless or a long-running service is.
The pattern: a Node Route Handler at app/api/chat-frontier/route.ts (with export const runtime = 'nodejs';) does the RAG retrieval, constructs the grounded prompt, and streams the LLM response back. The Edge layer above reverse-proxies the streaming response to the client.
The RAG implementation deep dive is in the RAG-over-website article cluster.
Layer 5: the streaming model call
The frontier-model call uses streaming so the user sees tokens as they arrive. Current frontier LLM APIs all support server-sent-event streaming.
// In the Node Route Handler
const stream = await openai.chat.completions.create({
model: 'your-chosen-model',
messages: groundedMessages,
stream: true,
});
const encoder = new TextEncoder;
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) controller.enqueue(encoder.encode(text));
}
controller.close;
},
});
return new Response(readable, {
headers: { 'content-type': 'text/plain; charset=utf-8' },
});
The text/plain content type is intentional — text/event-stream works but adds parsing overhead on the client; text/plain with chunked transfer encoding is simpler and equivalent.
A few habits keep the widget from undoing the work above:
- React 19 + Next.js 15 App Router for everything that's not the widget
- Tailwind CSS for styling (no runtime CSS-in-JS)
- A single small bundler-friendly state hook in the widget; no large form library
- Dynamic-import for the Rive runtime and any other heavyweight client code
- No global polyfills; the runtime targets ES2022 baseline
Stores that ship Tidio (110 KB), Intercom (230 KB), or LiveChat (280 KB) on top of an otherwise-fast Next.js stack regress LCP and INP measurably.
Edge vs Node decision
Three rules for placing each part of the chat backend.
Place at the edge: routing decisions, auth checks, rate limiting, simple cached lookups, model selection logic. Everything that fits in 50 ms and 100 KB of working memory.
Place in Node serverless: RAG retrieval, embedding calls, re-ranking, complex tool calling, integrations with external services. Anything that takes 200-2000 ms or needs a database connection.
Place in a long-running service: the embedding store itself, vector index management, batch crawls. Things that benefit from warm state and don't need to scale to zero.
Done well, the visitor sees the first words of a reply in well under a second, even though real work is happening on the server in between. The deeper treatment of latency budgeting lives in the companion article.
Authentication and personalization
Authenticated chat enables personalized grounding. The agent can read the customer's order history, account tier, and saved preferences without exposing them to the client.
The pattern: a server-only function (using Next.js cookies and your auth library) extracts the session and returns a signed token to the chat backend. The chat backend uses the token to fetch personalization context from the database, includes it in the RAG prompt, and the response is grounded in that context.
// app/api/chat/route.ts (Edge runtime — auth check)
import { cookies } from 'next/headers';
import { verifyToken } from '@/lib/auth';
export async function POST(req: NextRequest) {
const cookieStore = await cookies;
const token = cookieStore.get('session')?.value;
const user = token? await verifyToken(token) : null;
// Forward to backend with authenticated user context
const upstream = await fetch('/api/chat-grounded', {
method: 'POST',
body: JSON.stringify({
messages: (await req.json).messages,
userId: user?.id,
}),
});
return new Response(upstream.body, { headers: { 'content-type': 'text/plain' } });
}
Authentication context never reaches the client; the chat surface only sees the agent's responses.
Yokaify on Next.js: the script-tag path vs the React SDK
Two integration paths.
Script-tag. A single <script async> tag in app/layout.tsx (inside <head> via <Script> from next/script). The agent runs entirely outside React's hydration tree, mounted to a portal. Lightest install, fastest to ship.
React SDK. A <YokaifyAgent /> Client Component imported and placed in the layout. Full control over rendering, theming, and event hooks; the agent is a first-class React tree node.
// app/layout.tsx (script-tag path)
import Script from 'next/script';
export default function RootLayout({ children }) {
return (
<html>
<body>
{children}
<Script
src="https://cdn.yokaify.com/v1/widget.js"
strategy="afterInteractive"
data-site-id="YOUR_SITE_ID"
/>
</body>
</html>
);
}
The script-tag path is recommended for marketing sites where the agent's UX is "one mascot in the corner". The React SDK is recommended for product apps where the agent is embedded inside a custom UI shell.
React Server Components vs Client Components: which goes where?
For chat-related code:
- RSC: the page that hosts the chat, the layout, any FAQ content the chat references, the navigation.
- Client Component: the chat widget itself, the mascot, the streaming-fetch hook, the behavior-signal collector.
The boundary is "does this need a browser-only API?" Canvas, WebSocket, IntersectionObserver, navigator.* — Client. Everything else — Server.
The deep dive on the split is in the React Server Components vs Client Components article.
Further reading
- GuideThe rive ecommerce playbookThe animation runtime layer of the chat widget.
- GuideThe ecommerce performance referenceThe CWV discipline this architecture respects.
- GuideThe in-session conversion agent guideThe category the chat widget serves.
- BlogRive in Next.js — `use client` patternsThe dynamic-import pattern for the Rive runtime.
Frequently asked questions
A small Client Component for the chat surface, dynamic-imported. Owns chat state, streaming connection, and Rive canvas. Everything else as React Server Components.
Last updated May 31, 2026.