Glossary

llms.txt

llms.txt is a proposed standard for sites to publish a curated index of their highest-value content URLs for AI crawlers. Adoption is 5-10% of the top-10k web in May 2026; most AI crawlers don't read it yet, but Google is testing Lighthouse llms.txt checks and the cost to ship is one afternoon.

NKNilesh KumarJune 1, 20264 min readUpdated May 31, 2026
Yokaify
An llms.txt is a curated, human-readable index for AI crawlers.

What an llms.txt looks like

# Yokaify

> Yokaify is the Onsite Conversion Agent: an animated AI character
> that watches visitor behavior in real time and steps in at the right
> moment to help them convert.

## Core pages

- [Onsite Conversion Agent: 2026 field guide](https://yokaify.com/guides/onsite-conversion-agent.md)
- [Proactive chat in 2026](https://yokaify.com/guides/proactive-chat.md)
- [Cart abandonment recovery playbook](https://yokaify.com/guides/cart-abandonment.md)

## Tools

- [Cart abandonment calculator](https://yokaify.com/tools/cart-abandonment-calculator.md)
- [Mascot ROI calculator](https://yokaify.com/tools/mascot-roi-calculator.md)
- ...

A few conventions to note:

  • Hash-prefixed sections group related links by purpose.
  • Markdown links point to the canonical URL, often with a .md suffix (a proposed convention; some sites just use the regular URL).
  • A blockquote summary at the top gives a crawler a quick sense of what the site is.

Why ship an llms.txt

  • It is cheap. Generate it from your sitemap, then pick out the pages that matter most. An afternoon for most sites.
  • It is a hedge. AI crawlers will likely start reading it eventually, and early adopters benefit when they do.
  • Lighthouse may notice. Google's experimental llms.txt audit could become a signal, and the spec authors are paying attention.

What to put in it

Lead with a two- or three-sentence summary that names the brand, the category, and the main value. Then link the pages worth reading first: your pillar guides, your tools, your research and data-rich articles, your strongest comparison pages, and your best glossary entries. A curated file usually lands somewhere around 80-120 entries. Bigger is not better here; the curation is the point.

Where llms.txt sits in the AI-crawler stack

SurfaceWhat it doesAdoption
robots.txtAllow / disallow crawler accessUniversal
sitemap.xmlComprehensive URL index for crawlers~85% of top-10k sites
Schema.org markupPer-page structured data~50-60% of top-10k sites
llms.txtCurated AI-crawler index5-10% of top-10k sites

It is the newest and least-adopted of the four, so the others still matter more right now.

  • robots.txt. Sets allow and disallow rules. It is not a curated content list.
  • sitemap.xml. An exhaustive URL index. llms.txt is curated by comparison.
  • Schema.org / JSON-LD. Per-page structured data. llms.txt works at the site level.

See also

First defined: June 1, 2026. Adoption rate from 2026 GEO research aggregators; standard reference: llmstxt.org. Content paraphrased for compliance with licensing restrictions.