Should the chat backend run on Edge functions or Node?

Edge functions for the routing layer (incoming chat -> selecting which model to call); Node for the heavy work (RAG retrieval, large embeddings, complex tool calling). The 2026 pattern is hybrid: Edge handles the < 50 ms decision, Node handles the < 500 ms generation. Vercel Edge Functions and Cloudflare Workers both work for the routing layer; Node runs on whichever serverless platform you already use.

Can I use Server-Sent Events (SSE) for chat streaming in Next.js?

Yes. The 2026 pattern is the streaming response from a Route Handler — return a ReadableStream from the Response, the Client Component reads it via fetch + ReadableStreamDefaultReader. SSE is supported but the underlying implementation is the same. WebSockets are used for bidirectional flows (typing indicators, presence) but not necessary for one-way chat streaming.

How does Rive integrate with Next.js App Router?

Rive runs in a Client Component (it owns canvas state). The recommended pattern: dynamic-import the mascot component with ssr: false to keep the runtime out of the initial JS bundle, fetch the .riv file from public/, and load it after the LCP event. The deep dive is in the [Rive in Next.js article](/blog/rive-nextjs-use-client) and the [The Rive ecommerce playbook](/guides/rive-animation-ecommerce).

Can the chat agent read user authentication state from Next.js?

Yes. The pattern: a server-only function (using Next.js cookies / auth) returns a session token to the Client Component on render. The Client Component sends the token with each chat request. Authenticated chat enables personalized grounding (the agent reads the customer's order history, account state, plan tier) without exposing auth state to the client.

Does Yokaify support custom React integration without the script tag?

Yes. The Yokaify React SDK ships a Client Component (` `) that gives full control over rendering, theming, and event handling. The script-tag install is the simpler path for marketing sites; the React SDK is for product apps that want the agent inside a custom UI shell.

How do I handle the chat widget in static export / next export?

Static export works fine for the page shell and the widget mount-point. The chat backend (the streaming endpoint) needs to be hosted somewhere — typically the same site's API routes (which run on Vercel Edge / Node) or a separate API host. The widget shell ships with the static HTML; the dynamic chat connects to the backend at runtime.

Next.js Chatbot: The 2026 Architecture Guide

Q: Can I use Server-Sent Events (SSE) for chat streaming in Next.js?

Yes. The 2026 pattern is the streaming response from a Route Handler — return a ReadableStream from the Response, the Client Component reads it via fetch + ReadableStreamDefaultReader. SSE is supported but the underlying implementation is the same. WebSockets are used for bidirectional flows (typing indicators, presence) but not necessary for one-way chat streaming.

Q: How does Rive integrate with Next.js App Router?

Rive runs in a Client Component (it owns canvas state). The recommended pattern: dynamic-import the mascot component with ssr: false to keep the runtime out of the initial JS bundle, fetch the .riv file from public/, and load it after the LCP event. The deep dive is in the [Rive in Next.js article](/blog/rive-nextjs-use-client) and the [The Rive ecommerce playbook](/guides/rive-animation-ecommerce).

Q: Can the chat agent read user authentication state from Next.js?

Yes. The pattern: a server-only function (using Next.js cookies / auth) returns a session token to the Client Component on render. The Client Component sends the token with each chat request. Authenticated chat enables personalized grounding (the agent reads the customer's order history, account state, plan tier) without exposing auth state to the client.

Q: Does Yokaify support custom React integration without the script tag?

Yes. The Yokaify React SDK ships a Client Component (` `) that gives full control over rendering, theming, and event handling. The script-tag install is the simpler path for marketing sites; the React SDK is for product apps that want the agent inside a custom UI shell.

Q: How do I handle the chat widget in static export / next export?

Static export works fine for the page shell and the widget mount-point. The chat backend (the streaming endpoint) needs to be hosted somewhere — typically the same site's API routes (which run on Vercel Edge / Node) or a separate API host. The widget shell ships with the static HTML; the dynamic chat connects to the backend at runtime.

If you are building chat or an onsite agent on Next.js, most of the hard architectural calls are already settled. React Server Components render the page. A small Client Component runs the widget. Replies stream from a route handler. The heavy work splits between Edge and Node. None of it is exotic anymore, and the performance budget is very reachable as long as you keep the widget off the critical path.

What follows is a practical walkthrough of each layer: where the code lives, why it lives there, and the handful of places teams still trip up.

Live widget

A Next.js storefront page with a small animated chat widget in the corner, mid-reply, showing text streaming in word by word below a friendly mascot.

The widget is a small client island; the rest of the page is server-rendered and ships no extra JavaScript for it.

The five-layer architecture

A 2026 production chat on Next.js has five layers, three of which run on the server and two on the client.

┌─────────────────────────────────────────────────────────────┐
│ CLIENT (React 19 + Next.js 15 App Router)                   │
│ ┌───────────────────────────────────────────────────────┐   │
│ │ 1. Page shell — RSC, no JS shipped                    │   │
│ └───────────────────────────────────────────────────────┘   │
│ ┌───────────────────────────────────────────────────────┐   │
│ │ 2. Chat widget — Client Component (dynamic import)    │   │
│ │    — Rive mascot (state-machine driven)               │   │
│ │    — Chat surface (streaming text from server)        │   │
│ │    — Behavior signal collector (scroll, dwell, etc.)  │   │
│ └───────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                              ↕ fetch + streaming
┌─────────────────────────────────────────────────────────────┐
│ SERVER (Vercel / Cloudflare / etc.)                         │
│ ┌───────────────────────────────────────────────────────┐   │
│ │ 3. Edge routing — Route Handler at the edge           │   │
│ │    — Auth check, rate limit, signal classification    │   │
│ └───────────────────────────────────────────────────────┘   │
│ ┌───────────────────────────────────────────────────────┐   │
│ │ 4. RAG retrieval — Node serverless                    │   │
│ │    — Vector search, re-ranking, prompt construction   │   │
│ └───────────────────────────────────────────────────────┘   │
│ ┌───────────────────────────────────────────────────────┐   │
│ │ 5. Model call — streaming response from frontier LLM  │   │
│ └───────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

The split is deliberate. The fast, cheap routing decision happens at the edge, close to the visitor. The slower work, retrieving the right content and writing a reply, runs in Node, where a database connection and heavier compute are available.

Layer 1: the page shell as RSC

Every piece of HTML on a marketing or product page that doesn't need browser-only state is a React Server Component. Headers, footers, navigation, hero copy, product galleries — all RSC. The result is HTML that renders without JavaScript and loads at sub-1.5s LCP on most networks.

// Server Component (default)
import { Hero } from './Hero';        // RSC
import { ProductGrid } from './ProductGrid';  // RSC, fetches from CMS
import { Mascot } from './Mascot';    // Client Component (see Layer 2)

export default async function HomePage {
  return (
    <>
      <Hero />
      <ProductGrid />
      <Mascot /> {/* Mounts a small client island */}
    </>
  );
}

The Mascot import is a regular component import — Next.js automatically detects the 'use client' directive on the Mascot file and ships only that boundary's JS to the browser.

The widget is a Client Component because it owns:

Canvas state (Rive)
WebSocket / streaming-fetch state (chat connection)
Per-session behavior state (scroll position, dwell time, signal counters)
DOM event handlers (click, input, keyboard)

'use client';

import { useState, useEffect, useRef } from 'react';
import dynamic from 'next/dynamic';

const RiveMascot = dynamic( => import('./RiveMascot').then((m) => m.RiveMascot), {
  ssr: false,
  loading:  => null,
});

export function Mascot {
  const [open, setOpen] = useState(false);
  const [messages, setMessages] = useState<{ role: string; text: string }>;

async function send(input: string) {
    setMessages((m) => [...m, { role: 'user', text: input }]);

const res = await fetch('/api/chat', {
      method: 'POST',
      body: JSON.stringify({ messages: [...messages, { role: 'user', text: input }] }),
    });

const reader = res.body!.getReader;
    const decoder = new TextDecoder;
    let assistantText = '';
    setMessages((m) => [...m, { role: 'assistant', text: '' }]);

while (true) {
      const { done, value } = await reader.read;
      if (done) break;
      assistantText += decoder.decode(value, { stream: true });
      setMessages((m) => {
        const last = m[m.length - 1];
        return [...m.slice(0, -1), { ...last, text: assistantText }];
      });
    }
  }

return (
    <div className="fixed bottom-6 right-6">
      <RiveMascot onClick={ => setOpen(true)} />
      {open && <ChatSurface messages={messages} onSend={send} />}
    </div>
  );
}

Three things this widget gets right.

The Rive component is dynamic-imported with ssr: false. The Rive runtime never enters the SSR pipeline (it can't — it's WASM-based) and never enters the initial JS bundle.

The chat fetch uses the streaming response pattern. The body is a ReadableStream; the reader reads chunks as they arrive and updates the React state. The visitor sees text appear word-by-word, not in a 3-second silence followed by a complete response.

The widget owns its own state. There's no global store, no Redux, no Zustand. For a chat widget, local state is the right answer; the persistence layer is the server, not the client.

Layer 3: the Edge Route Handler

The Edge layer handles routing decisions: which model to call, which RAG index to query, whether to short-circuit with a cached response. The Edge function runs at the geographic edge nearest the visitor, so the round-trip is sub-50 ms even on slow networks.

// app/api/chat/route.ts
import { NextRequest } from 'next/server';

export const runtime = 'edge';

export async function POST(req: NextRequest) {
  const { messages } = await req.json;

// Routing decision: small model for short replies, frontier for complex
  const lastUserMessage = messages[messages.length - 1].text;
  const useFrontier = lastUserMessage.length > 50 || /[?]/.test(lastUserMessage);

// Hand off to the RAG + generation backend
  const upstream = await fetch(useFrontier? '/api/chat-frontier' : '/api/chat-flash', {
    method: 'POST',
    body: JSON.stringify({ messages }),
  });

return new Response(upstream.body, {
    headers: { 'content-type': 'text/event-stream' },
  });
}

The Edge layer is also where rate limiting, auth checks, and signal classification live. Putting these at the edge keeps cold-start cost low and means a malformed or abusive request never reaches the heavier Node layer.

Layer 4: RAG retrieval in Node

The RAG layer needs heavier compute (vector math, embedding lookups, optional re-ranking) and a database connection to a vector store. Edge functions are not the right fit; Node serverless or a long-running service is.

The pattern: a Node Route Handler at app/api/chat-frontier/route.ts (with export const runtime = 'nodejs';) does the RAG retrieval, constructs the grounded prompt, and streams the LLM response back. The Edge layer above reverse-proxies the streaming response to the client.

The RAG implementation deep dive is in the RAG-over-website article cluster.

Layer 5: the streaming model call

The frontier-model call uses streaming so the user sees tokens as they arrive. Current frontier LLM APIs all support server-sent-event streaming.

// In the Node Route Handler
const stream = await openai.chat.completions.create({
  model: 'your-chosen-model',
  messages: groundedMessages,
  stream: true,
});

const encoder = new TextEncoder;
const readable = new ReadableStream({
  async start(controller) {
    for await (const chunk of stream) {
      const text = chunk.choices[0]?.delta?.content ?? '';
      if (text) controller.enqueue(encoder.encode(text));
    }
    controller.close;
  },
});

return new Response(readable, {
  headers: { 'content-type': 'text/plain; charset=utf-8' },
});

The text/plain content type is intentional — text/event-stream works but adds parsing overhead on the client; text/plain with chunked transfer encoding is simpler and equivalent.

Streaming

A chat reply filling in word by word as tokens stream from the server, so the visitor starts reading before the full answer is finished.

Streaming means the first words land almost immediately instead of after a long pause.

A few habits keep the widget from undoing the work above:

React 19 + Next.js 15 App Router for everything that's not the widget
Tailwind CSS for styling (no runtime CSS-in-JS)
A single small bundler-friendly state hook in the widget; no large form library
Dynamic-import for the Rive runtime and any other heavyweight client code
No global polyfills; the runtime targets ES2022 baseline

Stores that ship Tidio (110 KB), Intercom (230 KB), or LiveChat (280 KB) on top of an otherwise-fast Next.js stack regress LCP and INP measurably.

Edge vs Node decision

Three rules for placing each part of the chat backend.

Place at the edge: routing decisions, auth checks, rate limiting, simple cached lookups, model selection logic. Everything that fits in 50 ms and 100 KB of working memory.

Place in Node serverless: RAG retrieval, embedding calls, re-ranking, complex tool calling, integrations with external services. Anything that takes 200-2000 ms or needs a database connection.

Place in a long-running service: the embedding store itself, vector index management, batch crawls. Things that benefit from warm state and don't need to scale to zero.

Done well, the visitor sees the first words of a reply in well under a second, even though real work is happening on the server in between. The deeper treatment of latency budgeting lives in the companion article.

Edge vs Node

A three-column decision diagram: fast routing, auth, and rate limiting at the edge; retrieval and reply generation in Node serverless; and the search index kept warm in a long-running service.

A rule of thumb for where each piece of the chat backend belongs.

Authentication and personalization

Authenticated chat enables personalized grounding. The agent can read the customer's order history, account tier, and saved preferences without exposing them to the client.

The pattern: a server-only function (using Next.js cookies and your auth library) extracts the session and returns a signed token to the chat backend. The chat backend uses the token to fetch personalization context from the database, includes it in the RAG prompt, and the response is grounded in that context.

// app/api/chat/route.ts (Edge runtime — auth check)
import { cookies } from 'next/headers';
import { verifyToken } from '@/lib/auth';

export async function POST(req: NextRequest) {
  const cookieStore = await cookies;
  const token = cookieStore.get('session')?.value;
  const user = token? await verifyToken(token) : null;

// Forward to backend with authenticated user context
  const upstream = await fetch('/api/chat-grounded', {
    method: 'POST',
    body: JSON.stringify({
      messages: (await req.json).messages,
      userId: user?.id,
    }),
  });

return new Response(upstream.body, { headers: { 'content-type': 'text/plain' } });
}

Authentication context never reaches the client; the chat surface only sees the agent's responses.

Yokaify on Next.js: the script-tag path vs the React SDK

Two integration paths.

Script-tag. A single <script async> tag in app/layout.tsx (inside <head> via <Script> from next/script). The agent runs entirely outside React's hydration tree, mounted to a portal. Lightest install, fastest to ship.

React SDK. A <YokaifyAgent /> Client Component imported and placed in the layout. Full control over rendering, theming, and event hooks; the agent is a first-class React tree node.

// app/layout.tsx (script-tag path)
import Script from 'next/script';

export default function RootLayout({ children }) {
  return (
    <html>
      <body>
        {children}
        <Script
          src="https://cdn.yokaify.com/v1/widget.js"
          strategy="afterInteractive"
          data-site-id="YOUR_SITE_ID"
        />
      </body>
    </html>
  );
}

The script-tag path is recommended for marketing sites where the agent's UX is "one mascot in the corner". The React SDK is recommended for product apps where the agent is embedded inside a custom UI shell.

React Server Components vs Client Components: which goes where?

For chat-related code:

RSC: the page that hosts the chat, the layout, any FAQ content the chat references, the navigation.
Client Component: the chat widget itself, the mascot, the streaming-fetch hook, the behavior-signal collector.

The boundary is "does this need a browser-only API?" Canvas, WebSocket, IntersectionObserver, navigator.* — Client. Everything else — Server.

The deep dive on the split is in the React Server Components vs Client Components article.

Frequently asked questions

A small Client Component for the chat surface, dynamic-imported. Owns chat state, streaming connection, and Rive canvas. Everything else as React Server Components.

Last updated May 31, 2026.