llms.txt — Glossary

A sample llms.txt

A short markdown file served at /llms.txt with a one-paragraph site summary at the top and grouped lists of links to guides, tools, and key pages.

An llms.txt is a curated, human-readable index for AI crawlers.

What an llms.txt looks like

# Yokaify

> Yokaify is the Onsite Conversion Agent: an animated AI character
> that watches visitor behavior in real time and steps in at the right
> moment to help them convert.

## Core pages

- [Onsite Conversion Agent: 2026 field guide](https://yokaify.com/guides/onsite-conversion-agent.md)
- [Proactive chat in 2026](https://yokaify.com/guides/proactive-chat.md)
- [Cart abandonment recovery playbook](https://yokaify.com/guides/cart-abandonment.md)

## Tools

- [Cart abandonment calculator](https://yokaify.com/tools/cart-abandonment-calculator.md)
- [Mascot ROI calculator](https://yokaify.com/tools/mascot-roi-calculator.md)
- ...

A few conventions to note:

Hash-prefixed sections group related links by purpose.
Markdown links point to the canonical URL, often with a .md suffix (a proposed convention; some sites just use the regular URL).
A blockquote summary at the top gives a crawler a quick sense of what the site is.

Why ship an llms.txt

It is cheap. Generate it from your sitemap, then pick out the pages that matter most. An afternoon for most sites.
It is a hedge. AI crawlers will likely start reading it eventually, and early adopters benefit when they do.
Lighthouse may notice. Google's experimental llms.txt audit could become a signal, and the spec authors are paying attention.

What to put in it

Lead with a two- or three-sentence summary that names the brand, the category, and the main value. Then link the pages worth reading first: your pillar guides, your tools, your research and data-rich articles, your strongest comparison pages, and your best glossary entries. A curated file usually lands somewhere around 80-120 entries. Bigger is not better here; the curation is the point.

Where llms.txt sits in the AI-crawler stack

Surface	What it does	Adoption
robots.txt	Allow / disallow crawler access	Universal
sitemap.xml	Comprehensive URL index for crawlers	~85% of top-10k sites
Schema.org markup	Per-page structured data	~50-60% of top-10k sites
llms.txt	Curated AI-crawler index	5-10% of top-10k sites

It is the newest and least-adopted of the four, so the others still matter more right now.

robots.txt. Sets allow and disallow rules. It is not a curated content list.
sitemap.xml. An exhaustive URL index. llms.txt is curated by comparison.
Schema.org / JSON-LD. Per-page structured data. llms.txt works at the site level.

GEO — the broader discipline llms.txt supports
Concept density — a neighboring content-quality signal
Citation grounding — what AI engines do with crawled content

What an llms.txt looks like

Why ship an llms.txt

What to put in it

Where llms.txt sits in the AI-crawler stack

How it differs from related standards

Related terms

See also